Bruno wrote:
> Some characters got mapped to the Unicode PUA, because they were not
in Unicode at that time. Then they got added to Unicode.
> A bijective 1-1 conversion table does not provide the best user experience
in this situation.
Figured out a little history:
GB18030/2000 (up to Ext-A): 0xFE6C -> U+E831 (PUA)
Character adopted by Unicode U+215D7 (Ext-B)
GB18030/2005 adopted Ext-B: 0x9536B937 -> U+215D7
The real question here:
U+215D7 -> GB18030: 0xFE6C or 0x9536B937?
I think 0x9536B937 is the better choice, because Ext-B characters in GB18030 are all coded in 4 bytes.
I don't insist on 1-1 conversion any more since the one in PUA should retire some day.