coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unex


From: Sebastian Kisela
Subject: Re: Multibyte support for sort, uniq, join, tr, cut, paste, expand, unexpand, fmt, fold, and pr
Date: Wed, 17 Jan 2018 08:45:36 +0100

Hi Eric, hi Assaf!

I have checked the Eric's effort on the multibyte support for coreutils.
The work done seems solid.

However I tried using the multibyte tests that are part of the patch that
Assaf mentioned on top of Eric's repository,
and all of the tests that were using C locale failed at the first attempt
of printing a multibyte character.
I believe the reason is the approach to "error handling" as Eric expressed
and I am not sure, if that is not a dangerous behavior, considering it
might break a bunch of scripts counting on the current error handling?


  * Handling of invalid encodings. I generally stop with an error; you wrap
> the foreign byte and pass it through to the output as an opaque object.
>
>
It might be better to have at least an option to choose the behavior when
the invalid sequence is encountered.
See [1] and [2] for quite relevant discussion about error handling.

  * Surrogate pairs. I trust wchar_t to be a sufficient character type; you
> have a special case for UTF-16 systems.
>
> Here I agree with the approach from Eric.

Have a good one,
Sebastian


[1] http://lists.gnu.org/archive/html/coreutils/2016-10/msg00001.html
[2] http://austingroupbugs.net/bug_view_page.php?bug_id=1007


reply via email to

[Prev in Thread] Current Thread [Next in Thread]