[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24924: GNU pr only working with singlebyte 1-width characters
From: |
Stephane Chazelas |
Subject: |
bug#24924: GNU pr only working with singlebyte 1-width characters |
Date: |
Thu, 1 Dec 2016 06:32:22 +0000 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
2016-11-30 18:37:05 -0800, Paul Eggert:
> On 11/30/2016 03:30 AM, Stephane Chazelas wrote:
> >That can also be seen as a POSIX conformance bug
>
> Not really, as POSIX does not require support for UTF-8 (except in
> the pax utility, which is not part of coreutils).
[...]
POSIX does not require support for any charset. It only
specifies one locale (C/POSIX), doesn't specify the charset in
that locale other than it should be a single byte charset that
covers the portable character set. Examples of such charsets are
ASCII, iso8859-x or EBCDIC. In practice, that tends to be ASCII
(except for some rare EBCDIC based IBM systems) as tha
But it does support a localisation API and allows system to
support other locales with other charsets. That API does support
multi-byte encodings, including stateful ones (though how they
are /defined/ is implementation defined for lock-shift ones and
in practice those are unworkable so I'd expect those would
eventually be removed from the standard). It doesn't require
compliant systems to have locales with multi-byte character sets,
but if they have (if they show up in the output of locale -a),
then they have to be supported throughout (as specified, for all
the utilities for instance).
Basically, on systems that have locales with multi-byte
encodings --UTF-8 or other-- (most Unix-like ones including GNU
systems like Debian), GNU pr (and many other GNU utilities) is
not POSIX compliant.
See
http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap06.html
for details.
--
Stephane
- bug#24924: GNU pr only working with singlebyte 1-width characters,
Stephane Chazelas <=