gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New ABI NSConstantString


From: David Chisnall
Subject: Re: New ABI NSConstantString
Date: Sat, 7 Apr 2018 09:49:48 +0100

On 5 Apr 2018, at 20:09, Stefan Bidigaray <address@hidden> wrote:
> 
> I know this is probably going to be rejected, but how about making constant 
> string either ASCII or UTF-16 only? Scratching UTF-8 altogether? I know this 
> would increase the byte count for most European languages using Latin 
> characters, but I don't see the point of maintaining both UTF-8 and UTF-16 
> encoding. Everything that can be done with UTF-16 can be encoded in UTF-8 
> (and vise-versa), so how would the compiler pick between the two? 
> Additionally, wouldn't sticking to just 1 of the 2 encoding simplify the code 
> significantly?

I am leaning in this direction.  The APIs all want UTF-16 codepoints.  In 
ASCII, each character is precisely one UTF-16 codepoint.  In UTF-16, every 
two-byte value is a UTF-16 codepoint.  In UTF-8, UTF-16 codepoints are 
somewhere between 1 and 3 characters long and the mapping is complicated.  It’s 
a shame that in the 64-bit transition Apple didn’t make unichar 32 bits and 
make it a unicode character, so we’re stuck in the same situation of Windows 
with a hasty s/UCS2/UTF-16/ and an attempt to make the APIs keep working.

My current plan is to make the format support ASCII, UTF-8, UTF-16, and UTF-32, 
but only generate ASCII and UTF-16 in the compiler and then decide later if we 
want to support generating UTF-8 and UTF-32.  I also won’t initialise the hash 
in the compiler initially, until we’ve decided a bit more what the hash should 
be.

David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]