bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS


From: Bruno Haible
Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS
Date: Sun, 22 Jul 2018 23:40:35 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; )

Pádraig Brady wrote:
> > This patch is correct (because the characters that you test for in c_iscntrl
> > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a 
> > multibyte
> > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings).
> 
> ... It might be worth mentioning this subtle point in the c_iscntrl() docs?
> "Note this identifies all single byte control chars even in multibyte 
> encodings".

Only in the multibyte encodings that are currently in use. We never know what
kinds of features or misfeatures new multibyte encodings will come up with:
Before GB18030 was introduced, it was a common feature of all multibyte 
encodings
(including SJIS) that ASCII characters in the range 0x00..0x3F never occur as
second or later byte in a multibyte character. Well, GB18030 broke this 
assumption.

So, it is dangerous to rely on this property. Therefore I wouldn't like to
document it in the c_iscntrl() documentation.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]