[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4
From: |
Ingo Schwarze |
Subject: |
Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4 |
Date: |
Fri, 7 Dec 2018 21:31:14 +0100 |
User-agent: |
Mutt/1.8.0 (2017-02-23) |
Hi Branden,
G. Branden Robinson wrote on Thu, Dec 06, 2018 at 11:17:11PM -0500:
> At 2018-12-06T18:44:18+0100, Ingo Schwarze wrote:
>> - | tr '[:cntrl:]' ' '"
>> + | tr '[:cntrl:]' '[ *32]'"
> This might not be portable _enough_.
>
> The number of characters in the class :cntrl: is locale-dependent; you
> are only guaranteed 32 such codepoints if LC_CTYPE=C (that is, ASCII).
I expected that even for exotic locales, the newline character
would likely be among the 32 first found, but maybe you are right
that isn't guaranteed.
> POSIX says that the repeat count in the second argument to tr can be
> omitted, and the transliteration target will grow to fit the size of the
> source:
>
> https://pubs.opengroup.org/onlinepubs/009695399/utilities/tr.html
Interesting, i missed that. The version you are lnking to is outdated,
but the current version says the same:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html
> ...on the other hand, Solaris's relationship with POSIX has been
> difficult at best,
True, but...
> so I wouldn't be surprised if omitting the repeat
> count is disallowed in its implementation.
... actually, omitting the repeat count works on Solaris 9 to 11.
> But I know nothing about the
> limitations of historical versions of tr.
Using something that is standard-conforming (like "[ *]") is clearly
better than using something that is unspecified (like ' '), even if
we don't know which historical systems support it.
> Another approach would be to force LC_CTYPE=C in the pipeline before
> calling tr.
>
> So either:
>
> | tr '[:cntrl:]' '[ *]'"
>
> or:
>
> | LC_CTYPE=C tr '[:cntrl:]' '[ *32]'"
>
> perhaps?
Is that safe? I figure it might have catastrophic results if the
system default locale happens to be UTF-32 or something like that,
but i'm not sure. In any case, not all character encodings are
supersets of ASCII.
So, i see three standard-conforming options that work on all of
Linux, OpenBSD, Solaris 11, Solaris 10, and Solaris 9:
[1] tr '[:cntrl:]' '[ *32]'
Slight risk that on some system in some locale, the newline
might not be among the first 32 control characters.
[2] tr '[:cntrl:]' '[ *]'
Slight risk that some system might not support it.
[3] tr '\\\\n' ' '
Slight risk that on some system, we might get "\r\n" or "\r" -
not sure that fear makes sense, it might be FUD.
Also a slight risk that replacing other control characters
is somehow important, even though i don't think it is,
but maybe right before release is poor timing for such a change.
I tend to like [2] best - the change is minimal, addressing only
what must be changed and nothing else, compliant, and works.
So if somebody confims that opinion and nobody objects, i think i
should put that in: s/32//.
Yours,
Ingo
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, (continued)
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Bertrand Garrigues, 2018/12/06
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, G. Branden Robinson, 2018/12/06
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Ralph Corderoy, 2018/12/07
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Ingo Schwarze, 2018/12/07
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Ralph Corderoy, 2018/12/07
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Ingo Schwarze, 2018/12/07
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Ralph Corderoy, 2018/12/08
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Ingo Schwarze, 2018/12/08
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Bjarni Ingi Gislason, 2018/12/09
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Ralph Corderoy, 2018/12/10
- Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4,
Ingo Schwarze <=
Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Bertrand Garrigues, 2018/12/02
Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Bertrand Garrigues, 2018/12/02
Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Bertrand Garrigues, 2018/12/02
Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4, Bertrand Garrigues, 2018/12/07