[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS
From: |
Chih-Hsuan Yen |
Subject: |
bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS |
Date: |
Wed, 25 Jul 2018 23:51:13 +0800 |
2018-07-23 5:40 GMT+08:00 Bruno Haible <address@hidden>:
> Pádraig Brady wrote:
>> > This patch is correct (because the characters that you test for in
>> > c_iscntrl
>> > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a
>> > multibyte
>> > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings).
>>
>> ... It might be worth mentioning this subtle point in the c_iscntrl() docs?
>> "Note this identifies all single byte control chars even in multibyte
>> encodings".
>
> Only in the multibyte encodings that are currently in use. We never know what
> kinds of features or misfeatures new multibyte encodings will come up with:
> Before GB18030 was introduced, it was a common feature of all multibyte
> encodings
> (including SJIS) that ASCII characters in the range 0x00..0x3F never occur as
> second or later byte in a multibyte character. Well, GB18030 broke this
> assumption.
>
> So, it is dangerous to rely on this property. Therefore I wouldn't like to
> document it in the c_iscntrl() documentation.
>
> Bruno
>
Hello any update on this? Discussions about encodings are beyond my
knowledge, yet I can feel that it's difficult to correctly filter
control characters. How about following the idea from Pádraig Brady
and filter \n only?
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Chih-Hsuan Yen, 2018/07/21
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Pádraig Brady, 2018/07/21
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Pádraig Brady, 2018/07/22
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Bruno Haible, 2018/07/22
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS,
Chih-Hsuan Yen <=
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Paul Eggert, 2018/07/26
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Bruno Haible, 2018/07/26
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Pádraig Brady, 2018/07/26
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Paul Eggert, 2018/07/26
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Bruno Haible, 2018/07/27
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Paul Eggert, 2018/07/27
- bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Chih-Hsuan Yen, 2018/07/29
bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS, Paul Eggert, 2018/07/22