[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] groff_char(7): Combination of characters vs. single unicode

From: Ingo Schwarze
Subject: Re: [Groff] groff_char(7): Combination of characters vs. single unicode character
Date: Mon, 15 Dec 2014 21:25:57 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Carsten,

Carsten Kunze wrote on Mon, Dec 15, 2014 at 08:23:00PM +0100:

> when there is a unicode character for e.g. "not equal" (U+2260)
> why there is a combination of characters in groff_char(7)
> instead of unicode?  Is it intended for ASCII output?

I'm not completely sure what you intend to ask, so i will answer
your question in three ways:

 1. In case you are refering to the first column "Output" of the
    table "Mathematical Symbols":  It shows whatever the device
    the page is formatted for supports.  So yes, if you format for
    -Tascii, you get "!=" in that column, but if you format for
    -Tutf8, you get U+2260 (NOT EQUAL TO).

 2. In case you are using OpenBSD:
    Because OpenBSD does not include groff in the base install,
    man(1) does not use groff.  To get a correct formatting of
    the manual page groff_char(7), the OpenBSD port of groff
    installs the manual page preformatted into
    Obviously, one output device had to be picked for
    preformatting.  The one picked is -Tascii.  So even if you
    have enabled UTF-8 in your terminal, such that you get UTF-8
    output for source-installed manual pages, you get ASCII output
    for preformatted manual pages.

    For comparison, FreeBSD made a different choice.  They install
    source code for all manual pages, even groff_char(7).  When
    man(1) is asked to show any page, it first runs "mandoc -Tlint"
    on that page.  If that returns no errors, it uses mandoc(1) to
    show the page.  If that returns at least one error (it does for
    groff_char(7) because groff_char(7) uses many constructs mandoc(1)
    does not understand), FreeBSD man(1) checks whether groff is
    installed.  If it is, it uses groff to format the manual.
    Otherwise, it doesn't show the manual at all, but instead
    asks the user to please install groff.

    For the case at hand, that somewhat complicated algorithm
    is arguably better because it does not show -Tascii formatting
    on an UTF-8 terminal for this particular page.  But there are
    many downsides:  You cannot read the page at all without groff
    installed.  Whenever man(1) is called, every page is parsed
    twice.  When mandoc(1) mishandles a page without realizing
    itself that it's not up to the job (which i consider a bug
    in mandoc, but it does happen), the user ends up with garbled
    formatting, and there is no way to tell the ports system to
    use groff for that page.  When a manual page uses broken mdoc(7)
    syntax, mandoc -Tlint obviously reports errors, so man(1)
    decides to use groff(1) for formatting - which is a bad idea
    because mandoc(1) is actually slightly more resilient for
    handling manual pages containing syntax errors than groff.

 3. In case you are talking about the third column "Unicode"
    in said table, which contains "u003D_0338" even though
    groff actually produces U+2260:
    That looks like a documentation bug to me.  I'm not
    sending a patch because there are many such composite
    Unicode names in that column, so i suspect this is not
    the only one mismatching reality.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]