Unicode defines character encodings in three distinct sizes-utf-8, UTF-16, and utf-32-while the traditional character type is 8 bits.
UTF-16BE: 16-bit UCS Transformation format, big-endian byte order.
UTF-16LE: 16-bit UCS Transformation format, little-endian byte order.
When supplementary characters are involved, a supplementary is counted as two UTF-16 code units using CODEUNITS16, or one UTF-32 code unit using CODEUNITS32.
However, if you use UTF-16, the size of the original document roughly doubles and the document takes longer to parse.
使用增补字符时,对于一个增补字符,使用CODEUNITS16 计算是两个UTF-16代码单元,而使用CODEUNITS32 计算则是一个UTF-32代码单元。
比方说,如果UTF-16数据原样加载到C字符串中,字符串可能从第一个ASCII字符的第二个字节截断。