[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24924: pr has no concept of wide characters
From: |
Assaf Gordon |
Subject: |
bug#24924: pr has no concept of wide characters |
Date: |
Fri, 11 Nov 2016 11:36:20 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 |
severity 24924 wishlist
tags 24924 wishlist notabug
thanks
Hello Dan,
On 11/11/2016 11:10 AM, 積丹尼 Dan Jacobson wrote:
The pr documentation (man, info) doesn't mention how it has no concept
of wide characters.
$ pr -m --sep-string='^^^' file file
Indeed, most of the current coreutils programs do not support wide or
multi-byte characters correctly.
The current official implementation does not support it (which is why I marked
this item as 'wishlist' and not a bug).
On RedHat systems, there is the 'i18n' patch, which adds some support but also
introduces some problematic issues:
https://github.com/pixelb/coreutils/tree/i18n
However, there is an active effort to make all of them multibyte aware.
The latest updates are (in reverse chronological order, these are somewhat long
threads):
http://lists.gnu.org/archive/html/coreutils/2016-09/msg00026.html
http://lists.gnu.org/archive/html/coreutils/2016-09/msg00011.html
http://lists.gnu.org/archive/html/coreutils/2016-07/msg00013.html
'cut' and 'expand' were the first two programs I worked on.
'pr' is definitely on the list - once I have a proof-of-concept working, I
would very much appreciate if you could help me test it as there are many
edge-cases with multibyte support and wide-characters.
As a curiosity,
are you using UTF-8 locales exclusively, or do you have experience with
Shift-JIS or EUC-JP locales?
I'm leaving this ticket open, and welcome discussion and comments.
regards,
- assaf
P.S.
The usual disclaimer applies: there is currently no ETA for multibyte support
in coreutils.