[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #58796] preconv: want option to write traditional [g|t]roff special
[bug #58796] preconv: want option to write traditional [g|t]roff special characters where possible
Thu, 30 Jul 2020 13:11:08 -0400 (EDT)
Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0
Follow-up Comment #3, bug #58796 (project groff):
Thanks for the comments, Ingo. I understand and support the Unix philosophy,
but I disagree with some of your underlying assumptions.
If you developed a brand-new tool to do some text-processing task, something
designed to be used in pipelines with other tools, you could choose to specify
a) the input character set of your tool be a Unicode encoding, or
b) the tool only take some subset of Unicode as input, and require another
tool to pipe in translations for the rest of Unicode, using a syntax invented
specifically for these tools and not standardized anywhere else.
If you chose (b) on the grounds "pipelines are more Unixy," this would not be
a popular choice. Requiring helper applications to understand modern
character sets is not inherently "the Unix way." It's a stopgap used for
historical applications whose cores do not (yet) speak Unicode.
Groff is a historical application. It will always support \['e] because it
must always be able to process historical documents that used such character
representations. But \['e] should in no way be considered the canonical way
to represent the Unicode character LATIN SMALL LETTER E WITH ACUTE. Unicode
gives us the canonical representation. \['e] and \[u00E9] are merely
additional, roff-specific ways to represent this character.
The "roff-specific" part is important: the entire Unix philosophy of pipelines
requires that all I/O be in as general a form as possible to be able to
interact with as wide a range of other programs as possible. groff and
preconv, by contrast, communicate in a secret code that no other tool uses.
That's not the Unix way; that's a band-aid to cover up something that Werner
identified as one of the four major areas of groff that needed to be updated
back in 2013. The need has not lessened in the intervening years.
That groff is a historical package does not absolve it from modern best
practices in software design. Looking to the long term, this is what we
should be striving for. preconv is a very useful bridge in the meantime; I
believe you that the task of converting historical C++ code to natively handle
UTF-8 input is big and messy.* Nonetheless it should be considered groff's
* I'm currently going through a similar process--on a much smaller
scale--with some Perl code. And Perl actually handles a lot of the logic
automatically that a C program would have to manually implement. I don't know
what C++'s facilities are like, but I do know that no matter how good the
language's design, you'll run into stupid problems
<http://www.perlmonks.org/?node_id=11119633> that will derail you for a few
[comment #2 comment #2:]
> I would hate it if groff would start requiring iconv.
It's far better to leverage existing code that does what you need than to
re-implement the same logic in your own code. The principle "solve one task
only, but solve it well" ought to free the groff package from implementing its
own conversions between character encodings and let it instead focus on its
Anyway, if groff handled Unicode I/O natively (and thus also ASCII, a subset
thereof), I wouldn't expect iconv to become an installation requirement; it
would be a run-time requirement for those users who need to feed in documents
in other character encodings.
> it's much better to encode all non-ASCII characters and not force users to
adopt an obsolete locale.
Good points here; I agree. I fell into the trap of looking at the encoding
groff currently natively handles, and not at the big picture.
Reply to this item at:
Message sent via Savannah