groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About verbatim dashes in PostScript output


From: G. Branden Robinson
Subject: Re: About verbatim dashes in PostScript output
Date: Sat, 28 Oct 2023 08:57:51 -0500

Hi Jan,

At 2023-10-28T15:18:05+0200, Jan Engelhardt wrote:
> A recent LWN.net article <https://lwn.net/Articles/947941/> (paywalled 
> for a while)

For the benefit of those reading this in the future, the article should
be free to read starting about 2 November 2023.

> pointed at https://bugs.debian.org/1041731 and the topic of 
> "-" vs "\-".
> 
> Given the following input:
> 
>       -\-\[u002D]\[u2013]\[u2014]+\[u2212]
> 
> Feeding it through `groff -Tutf8`, I get
> 
>       ‐−-–—+−
>       <U+2010><U+2212><U+002D><U+2013><U+002B><U+2014>
> 
> groff_char(7) says \- maps to "minus sign/Unix dash". Ambiguous, but
> ok, it is what it is.

Yes.  We're kind of trapped here; AT&T troff always documented `\-`
specifically and exclusively as a "minus sign".  Not a "hyphen-minus" or
something like that.  The "Unix dash" term might have been my invention
to try to advise the same people who aren't listening to me in that LWN
thread.

> Is there a better way though than to explicitly use \[u002D] to get a
> guaranteed U+002D?

Not a better one, no.  (There's a worse one, involving `\N`.)

_Unless_ you're using man(7) or mdoc(7), your document can:

1.  Remap \- to \[u002D] with `tr` or `char`; or
2.  Define a string to interpolate \[u002D].

Man pages should not do either of these, because they will just make a
bad situation worse, causing more man pages to be inconsistent with each
other, Albert Cahalan-style.

> Second, I turn to PostScript output that is generated by
> `groff -Tps`. One observes:
> 
>       troff:<standard input>:1: warning: special character 'u002D' not defined
> 
> (Converting the PS to PDF and opening that with evince), the rendered
> view shows a hyphen, a minus, an endash, an emdash, and another minus
> but rendered in a different vertical position which does not line up
> with the '+' sign.

Let's see, your input was...

>       -\-\[u002D]\[u2013]\[u2014]+\[u2212]

That should be, in order:

a.  a hyphen (U+2010)
b.  a minus sign (U+2212) from the "current font" (likely a text font)
c.  a hyphen-minus (U+002D)
d.  an en dash (U+2013)
e.  an em dash (U+2014)
f.  a plus sign (U+002B) from the "current font" (likely a text font)
g.  and a minus sign (U+2212) from the "special font".

A shorter way to say \[u2212] is \[mi] (or `\(mi`; it's a venerable
special character identifier going back to Ossanna troff).

GNU troff maps certain Unicode code points back to special characters
first.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/uniglyph.cpp?h=1.23.0#n392

groff_char(7) attempts to explain why all this "text font" and "special
font" business exists.

       Notes   describes the glyph, elucidating the mnemonic value of
               the glyph name where possible.
[...]
               Entries marked with “***” denote glyphs used for
               mathematical purposes.  On typesetting devices, such
               glyphs are typically drawn from a special font (see
               groff_font(5)).  Often, such glyphs lack bold or italic
               style forms or have metrics that look incongruous in
               ordinary prose.  A few which are not uncommon in running
               text have “text variants”, which should work better in
               that context.  Conversely, a handful of glyphs that are
               normally drawn from a text font may be required in
               mathematical equations.  Both sets of exceptions are
               noted in the tables where they appear (“Logical symbols”
               and “Mathematical symbols”).

   Basic Latin
[...]
       The vertical bar is overloaded; the \[ba] and \[or] escape
       sequences may render differently.  See subsection “Mathematical
       symbols” below for special variants of the plus, minus, and
       equals signs normally drawn from this range.

   Mathematical symbols
[...]
       Observe the two varieties of the plus‐minus, multiplication, and
       division signs; \[+-], \[mu], and \[di] are normally drawn from
       the special font, but have text font variants.  Also be aware of
       three glyphs available in special font variants that are normally
       drawn from text fonts: the plus, minus, and equals signs.  These
       variants may differ in appearance or spacing depending on the
       device and font selected.

...and the entire "History" section.

> Third, when one copy-pastes the string shown in evince, I get back:
> 
>       -−–—+−
>       <U+002D><U+2212><U+2013><U+2014><U+002B><U+2212>
> 
> I expected to receive:
> 
>       <U+2010><U+002D><U+2013><U+2014><U+002B><U+2212>
> 
> so that copypasting commands from PS/PDF would work "right"
> similarly as it does for manpages when they use \-.

That is because \- is not a "hyphen-minus" (except in man pages, where
we are forced to remap it for practical reasons).  The C/A/T typesetter
that the Bell Labs CSRC acquired didn't _have_ a "hyphen-minus" glyph.
It had a hyphen, a minus sign, and an em dash.  So, to troff, \- is a
minus sign, and when you format `\-` when not using a man page macro
package, that is what you get.

If you add \[pl] to your list, _that_ plus sign's crossbar should line
up with the U+2212 minus sign, and if it doesn't, I'd be curious to see
the output of "groff -Tps -Z".[1]  (But it's always possible for a font
to be buggy.)

Does this clear things up?  Please tell me if there is anything not
making sense, or any way I can improve the groff_char(7) man page.

Regards,
Branden

[1] For me, it's doing what it should.

$ printf -- '-\\-\\[u002D]\\[u2013]\\[u2014]+\\[u2212]\\[pl]\n' | groff -Tps -Z 
| tail
troff:<standard input>:1: warning: special character 'u002D' not defined
x font 11 S
f11
Cmi
h5490
Cpl
h5490
n12000 0
x trailer
V792000
x stop

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]