[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About verbatim dashes in PostScript output

From: G. Branden Robinson
Subject: Re: About verbatim dashes in PostScript output
Date: Sat, 28 Oct 2023 08:57:51 -0500

Hi Jan,

At 2023-10-28T15:18:05+0200, Jan Engelhardt wrote:
> A recent article <> (paywalled 
> for a while)

For the benefit of those reading this in the future, the article should
be free to read starting about 2 November 2023.

> pointed at and the topic of 
> "-" vs "\-".
> Given the following input:
>       -\-\[u002D]\[u2013]\[u2014]+\[u2212]
> Feeding it through `groff -Tutf8`, I get
>       ‐−-–—+−
>       <U+2010><U+2212><U+002D><U+2013><U+002B><U+2014>
> groff_char(7) says \- maps to "minus sign/Unix dash". Ambiguous, but
> ok, it is what it is.

Yes.  We're kind of trapped here; AT&T troff always documented `\-`
specifically and exclusively as a "minus sign".  Not a "hyphen-minus" or
something like that.  The "Unix dash" term might have been my invention
to try to advise the same people who aren't listening to me in that LWN

> Is there a better way though than to explicitly use \[u002D] to get a
> guaranteed U+002D?

Not a better one, no.  (There's a worse one, involving `\N`.)

_Unless_ you're using man(7) or mdoc(7), your document can:

1.  Remap \- to \[u002D] with `tr` or `char`; or
2.  Define a string to interpolate \[u002D].

Man pages should not do either of these, because they will just make a
bad situation worse, causing more man pages to be inconsistent with each
other, Albert Cahalan-style.

> Second, I turn to PostScript output that is generated by
> `groff -Tps`. One observes:
>       troff:<standard input>:1: warning: special character 'u002D' not defined
> (Converting the PS to PDF and opening that with evince), the rendered
> view shows a hyphen, a minus, an endash, an emdash, and another minus
> but rendered in a different vertical position which does not line up
> with the '+' sign.

Let's see, your input was...

>       -\-\[u002D]\[u2013]\[u2014]+\[u2212]

That should be, in order:

a.  a hyphen (U+2010)
b.  a minus sign (U+2212) from the "current font" (likely a text font)
c.  a hyphen-minus (U+002D)
d.  an en dash (U+2013)
e.  an em dash (U+2014)
f.  a plus sign (U+002B) from the "current font" (likely a text font)
g.  and a minus sign (U+2212) from the "special font".

A shorter way to say \[u2212] is \[mi] (or `\(mi`; it's a venerable
special character identifier going back to Ossanna troff).

GNU troff maps certain Unicode code points back to special characters

groff_char(7) attempts to explain why all this "text font" and "special
font" business exists.

       Notes   describes the glyph, elucidating the mnemonic value of
               the glyph name where possible.
               Entries marked with “***” denote glyphs used for
               mathematical purposes.  On typesetting devices, such
               glyphs are typically drawn from a special font (see
               groff_font(5)).  Often, such glyphs lack bold or italic
               style forms or have metrics that look incongruous in
               ordinary prose.  A few which are not uncommon in running
               text have “text variants”, which should work better in
               that context.  Conversely, a handful of glyphs that are
               normally drawn from a text font may be required in
               mathematical equations.  Both sets of exceptions are
               noted in the tables where they appear (“Logical symbols”
               and “Mathematical symbols”).

   Basic Latin
       The vertical bar is overloaded; the \[ba] and \[or] escape
       sequences may render differently.  See subsection “Mathematical
       symbols” below for special variants of the plus, minus, and
       equals signs normally drawn from this range.

   Mathematical symbols
       Observe the two varieties of the plus‐minus, multiplication, and
       division signs; \[+-], \[mu], and \[di] are normally drawn from
       the special font, but have text font variants.  Also be aware of
       three glyphs available in special font variants that are normally
       drawn from text fonts: the plus, minus, and equals signs.  These
       variants may differ in appearance or spacing depending on the
       device and font selected.

...and the entire "History" section.

> Third, when one copy-pastes the string shown in evince, I get back:
>       -−–—+−
>       <U+002D><U+2212><U+2013><U+2014><U+002B><U+2212>
> I expected to receive:
>       <U+2010><U+002D><U+2013><U+2014><U+002B><U+2212>
> so that copypasting commands from PS/PDF would work "right"
> similarly as it does for manpages when they use \-.

That is because \- is not a "hyphen-minus" (except in man pages, where
we are forced to remap it for practical reasons).  The C/A/T typesetter
that the Bell Labs CSRC acquired didn't _have_ a "hyphen-minus" glyph.
It had a hyphen, a minus sign, and an em dash.  So, to troff, \- is a
minus sign, and when you format `\-` when not using a man page macro
package, that is what you get.

If you add \[pl] to your list, _that_ plus sign's crossbar should line
up with the U+2212 minus sign, and if it doesn't, I'd be curious to see
the output of "groff -Tps -Z".[1]  (But it's always possible for a font
to be buggy.)

Does this clear things up?  Please tell me if there is anything not
making sense, or any way I can improve the groff_char(7) man page.


[1] For me, it's doing what it should.

$ printf -- '-\\-\\[u002D]\\[u2013]\\[u2014]+\\[u2212]\\[pl]\n' | groff -Tps -Z 
| tail
troff:<standard input>:1: warning: special character 'u002D' not defined
x font 11 S
n12000 0
x trailer
x stop

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]