程式碼:
string s = "哈\uD950\uDF21高"; Console.WriteLine("string length:" + s.Length);
TextElementEnumerator textEnum =
StringInfo.GetTextElementEnumerator(s); while (textEnum.MoveNext()) {
Console.WriteLine("word {0}: {1}",
textEnum.ElementIndex,
textEnum.Current.ToString());
}
* This source code was highlighted with Source Code Highlighter.
輸出:
string length:4
word 0: 哈
word 1: ??
word 3: 高
重點:
- UTF-32等同於UCS-4 (Universal Character Set –4 ), 編碼範圍在0~7F FF FF FF
- UTF-16不等同於UCS-2。USC-2只涵蓋部分USC-4的字元。而UTF-16可以涵蓋所有UTF-32的字元
- UTF-16利用surrogate機制,讓某些UTF-32的字元以一對UTF-16編碼的形式來表示
- USC-2跟UTF-16的差別在於有無surrogate機制上
- .Net使用的char型別存放的是UTF-16