[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] Multibyte support for expand and unexpand v2
From: |
Ondrej Oprala |
Subject: |
[PATCH] Multibyte support for expand and unexpand v2 |
Date: |
Tue, 29 Sep 2015 10:47:43 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 |
Hi all,
this is a loose continuation of my patch from 2 years ago [1]. I'm
reposting the patches, but rewrote them
to only use gnulib's mbfile (and thus implicitly mbchar) modules instead
of being linked with libunistring. Although libunistring is extremely
lightweight as opposed to other solutions such as libicu, it IMHO still
brings too much overhead for a utility as simple as {,un}expand. We do
not do any character classification for {,un}expand (from a Unicode
standpoint) and all we want to know is a character's column width and
whether it is a tab/space. mbchar already provides this + it can
transparently work with non-Unicode input as well (see tests). I've
included some changes proposed by Pádraig when I first posted these, so
I am listing him as a co-author.
The code flow basically didn't change at all and there is no code
duplication. Thus, pure ASCII input is processed in the same manner as
any other combination of input characters.
RFC:
* should we expect non-POSIX whitespace in parse_tab_stops() as well, or
is that where we should draw the line?
* BOMs - there is already a RH BZfor {,un}expand (#1158494) that
basically claims that the UTF-8 BOM header should be honored even when
the utils are run under different locale settings. Seems some editors do
this (kate, emacs) and even utf-8 enabled terminals interpret it, even
when encoding in their settings is set to a different one (konsole),
unless I filter it back with luit. I personally, am against this special
casing, as this would IMHO have no end. Soon, someone with a
GB10030-encoded file will come around claiming we shouldn't interpret
its first three bytes as a BOM when running under a UTF-8 locale etc...
Thanks for any and all comments,
Ondrej
[1] https://lists.gnu.org/archive/html/coreutils/2013-02/msg00102.html
0001-expand-unexpand-add-multibyte-support.patch
Description: Text Data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [PATCH] Multibyte support for expand and unexpand v2,
Ondrej Oprala <=