bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #62787] [troff] standard error output hould be sanitized


From: G. Branden Robinson
Subject: [bug #62787] [troff] standard error output hould be sanitized
Date: Wed, 20 Jul 2022 04:36:02 -0400 (EDT)

URL:
  <https://savannah.gnu.org/bugs/?62787>

                 Summary: [troff] standard error output hould be sanitized
                 Project: GNU troff
               Submitter: gbranden
               Submitted: Wed 20 Jul 2022 08:36:00 AM UTC
                Category: Core
                Severity: 1 - Wish
              Item Group: Feature change
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Wed 20 Jul 2022 08:36:00 AM UTC By: G. Branden Robinson <gbranden>
Bjarni reported this issue to Debian about 4 years ago.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906091

"8 bit characters from the '.tm' request are displayed as \[u00XY]"

In bug #62726, I proposed a means of dealing with this.

> Having a stronger `troff` request to sanitize any diversion, string, or
macro of node data is a feature that I am increasingly coming to think is both
feasible and desirable.  Think, instead of "asciify", "utf8ify".  Anything
that can be back-converted to ASCII or a UTF-8 sequence (not forgetting that
our old friends the hyphen-minus, caret, tilde, etc. are not ASCII unless
remapped) is, and everything else is thrown out.

> I relatedly think that arguments to the `tm` family of requests (including
`ab`) should be similarly handled.  I think it would be significant effort for
little benefit to add general localization support for troff output to the
standard error stream.  That is, I don't want troff to have to care whether
the environment uses Latin-1 or Latin-9 or Unicode.  So the argument(s) to
these requests would be scooped into a temporary anonymous troff string and
then "utf8-sanitized" as described above, then re-emitted.   As a first cut I
wouldn't even inspect the environment, but just blast out the bytes in UTF-8
and if somebody's terminal encoding ain't that, they get mojibake.







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?62787>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]