[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] groff_char(7): Combination of characters vs. single unicode
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character
Mon, 15 Dec 2014 21:25:57 +0100
Carsten Kunze wrote on Mon, Dec 15, 2014 at 08:23:00PM +0100:
> when there is a unicode character for e.g. "not equal" (U+2260)
> why there is a combination of characters in groff_char(7)
> instead of unicode? Is it intended for ASCII output?
I'm not completely sure what you intend to ask, so i will answer
your question in three ways:
1. In case you are refering to the first column "Output" of the
table "Mathematical Symbols": It shows whatever the device
the page is formatted for supports. So yes, if you format for
-Tascii, you get "!=" in that column, but if you format for
-Tutf8, you get U+2260 (NOT EQUAL TO).
2. In case you are using OpenBSD:
Because OpenBSD does not include groff in the base install,
man(1) does not use groff. To get a correct formatting of
the manual page groff_char(7), the OpenBSD port of groff
installs the manual page preformatted into
Obviously, one output device had to be picked for
preformatting. The one picked is -Tascii. So even if you
have enabled UTF-8 in your terminal, such that you get UTF-8
output for source-installed manual pages, you get ASCII output
for preformatted manual pages.
For comparison, FreeBSD made a different choice. They install
source code for all manual pages, even groff_char(7). When
man(1) is asked to show any page, it first runs "mandoc -Tlint"
on that page. If that returns no errors, it uses mandoc(1) to
show the page. If that returns at least one error (it does for
groff_char(7) because groff_char(7) uses many constructs mandoc(1)
does not understand), FreeBSD man(1) checks whether groff is
installed. If it is, it uses groff to format the manual.
Otherwise, it doesn't show the manual at all, but instead
asks the user to please install groff.
For the case at hand, that somewhat complicated algorithm
is arguably better because it does not show -Tascii formatting
on an UTF-8 terminal for this particular page. But there are
many downsides: You cannot read the page at all without groff
installed. Whenever man(1) is called, every page is parsed
twice. When mandoc(1) mishandles a page without realizing
itself that it's not up to the job (which i consider a bug
in mandoc, but it does happen), the user ends up with garbled
formatting, and there is no way to tell the ports system to
use groff for that page. When a manual page uses broken mdoc(7)
syntax, mandoc -Tlint obviously reports errors, so man(1)
decides to use groff(1) for formatting - which is a bad idea
because mandoc(1) is actually slightly more resilient for
handling manual pages containing syntax errors than groff.
3. In case you are talking about the third column "Unicode"
in said table, which contains "u003D_0338" even though
groff actually produces U+2260:
That looks like a documentation bug to me. I'm not
sending a patch because there are many such composite
Unicode names in that column, so i suspect this is not
the only one mismatching reality.
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character, Ted Harding, 2014/12/15
Re: [Groff] groff_char(7): Combination of characters vs. single unicode character, Carsten Kunze, 2014/12/15