groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4


From: Ingo Schwarze
Subject: Re: [groff] 1.22.4.rc4 - Final RC before official 1.22.4
Date: Fri, 7 Dec 2018 21:31:14 +0100
User-agent: Mutt/1.8.0 (2017-02-23)

Hi Branden,

G. Branden Robinson wrote on Thu, Dec 06, 2018 at 11:17:11PM -0500:
> At 2018-12-06T18:44:18+0100, Ingo Schwarze wrote:

>> -        | tr '[:cntrl:]' ' '"
>> +        | tr '[:cntrl:]' '[ *32]'"

> This might not be portable _enough_.
> 
> The number of characters in the class :cntrl: is locale-dependent; you
> are only guaranteed 32 such codepoints if LC_CTYPE=C (that is, ASCII).

I expected that even for exotic locales, the newline character
would likely be among the 32 first found, but maybe you are right
that isn't guaranteed.

> POSIX says that the repeat count in the second argument to tr can be
> omitted, and the transliteration target will grow to fit the size of the
> source:
> 
> https://pubs.opengroup.org/onlinepubs/009695399/utilities/tr.html

Interesting, i missed that.  The version you are lnking to is outdated,
but the current version says the same:

  http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html

> ...on the other hand, Solaris's relationship with POSIX has been
> difficult at best,

True, but...

> so I wouldn't be surprised if omitting the repeat
> count is disallowed in its implementation.

 ... actually, omitting the repeat count works on Solaris 9 to 11.

> But I know nothing about the
> limitations of historical versions of tr.

Using something that is standard-conforming (like "[ *]") is clearly
better than using something that is unspecified (like ' '), even if
we don't know which historical systems support it.

> Another approach would be to force LC_CTYPE=C in the pipeline before
> calling tr.
> 
> So either:
> 
>         | tr '[:cntrl:]' '[ *]'"
> 
> or:
> 
>         | LC_CTYPE=C tr '[:cntrl:]' '[ *32]'"
> 
> perhaps?

Is that safe?  I figure it might have catastrophic results if the
system default locale happens to be UTF-32 or something like that,
but i'm not sure.  In any case, not all character encodings are
supersets of ASCII.


So, i see three standard-conforming options that work on all of
Linux, OpenBSD, Solaris 11, Solaris 10, and Solaris 9:

 [1] tr '[:cntrl:]' '[ *32]'

     Slight risk that on some system in some locale, the newline
     might not be among the first 32 control characters.

 [2] tr '[:cntrl:]' '[ *]'

     Slight risk that some system might not support it.

 [3] tr '\\\\n' ' '

     Slight risk that on some system, we might get "\r\n" or "\r" -
     not sure that fear makes sense, it might be FUD.
     Also a slight risk that replacing other control characters
     is somehow important, even though i don't think it is,
     but maybe right before release is poor timing for such a change.

I tend to like [2] best - the change is minimal, addressing only
what must be changed and nothing else, compliant, and works.

So if somebody confims that opinion and nobody objects, i think i
should put that in:  s/32//.

Yours,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]