bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locale-independent paragraph formatting


From: Eli Zaretskii
Subject: Re: Locale-independent paragraph formatting
Date: Sat, 11 Nov 2023 08:29:01 +0200

> From: Gavin Smith <gavinsmith0123@gmail.com>
> Date: Fri, 10 Nov 2023 19:48:04 +0000
> Cc: Bruno Haible <bruno@clisp.org>, bug-texinfo@gnu.org
> 
> On Fri, Nov 10, 2023 at 08:47:10AM +0200, Eli Zaretskii wrote:
> > > Does anybody know if we could just write 'a' instead of U'a' and rely
> > > on it being converted?
> > > 
> > > E.g. if you do
> > > 
> > > char32_t c = 'a';
> > > 
> > > then afterwards, c should be equal to 97 (ASCII value of 'a').
> > 
> > Why not?  What could be the problems with using this?
> 
> I think what was confusing me was the statement that char32_t held a UTF-32
> encoded Unicode character.  I then thought it would have a certain byte
> order, so if the UTF-32 was big endian, the bytes would have the order
> 00 00 00 61, whereas the value 97 on a little endian machine would have
> the order 61 00 00 00.  However, it seems that UTF-32 just means the
> codepoint is encoded as a 32-bit integer, and the endianness of the
> UTF-32 sequence can be assumed to match the endianness of the machine.
> The standard C integer conversions can be assumed to work when assigning
> to/from char32_t because it is just an integer type, I assume.

AFAIU, since a codepoint in UTF-32 is just one UTF-32 unit, the issue
of endianness doesn't apply.  Endianness in UTF encodings applies only
if a codepoint takes more than one unit, since the endianness is
between units, not within units themselves (where it always follows
the machine).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]