groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: groff man(7) `B` macro behavior with `\c`, and input traps


From: G. Branden Robinson
Subject: Re: groff man(7) `B` macro behavior with `\c`, and input traps
Date: Tue, 7 Jun 2022 00:24:35 -0500

[Warning: another long message, including SIMH PDP-11 output]

At 2022-06-06T12:17:30-0500, G. Branden Robinson wrote:
> It is disappointing that I couldn't get any useful information out of
> Unix V7 nroff.  But I tried only the default device (Teletype Model
> 37), and maybe others are more performant.  That in turn may require
> that I temporarily learn the escape sequences for ancient terminals
> like the "GE TermiNet 300", "DASI-300S", or "Diablo Hyperterm", which
> I've never even heard of in any other context.  Some days my beard is
> not so gray.
>
> And that will be possible only if adequate documentation for those
> devices survives.

It doesn't, as far as I've been able to tell.  I couldn't find any
manuals for the "DASI 300S" terminal (which apparently came in "GSI" and
"DTC" varieties).  All surviving documentation appears to be nroff or
old Unix plot(1)-related.

Nevertheless, the nroff terminal descriptions in V7 Unix suffice to draw
some guarded inferences.  Letting "nroff -T300s" output go to the
emulated terminal under SIMH caused only confusion in the terminal
driver, so I used the trusty old "od -c" standby.

I'll illustrate step-by-step my sequence of experiments.

I started with about the simplest input document there is.

$ cat > 1lineR.roff
foo
.pl \n(nlu
$ nroff -T300s 1lineR.roff | od -c
0000000 033 006   f   o   o  \r  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
0000020  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000100  \n  \n  \n  \n  \n  \n  \n  \n 033 006
0000112

It seems that limiting that page length with the `pl` request didn't do
a lot of good, but nevertheless I retained it in the subsequent
experiments to minimize confounding factors.

It seems that DASI 300s output involves the sequence 033 006 (ESC, ACK),
always at the beginning, and occasionally later.  Possibly this is some
kind of time-fill or synchronization primitive, because it isn't
correlated with output in a way I can easily discern, except that you
get more of them as the byte count of the output goes up.

Next, let's see what happens when we try to set boldface.

$ cat > 1lineB.roff
\fBfoo
.pl \n(nlu
$ nroff -T300s |#1lineB.roff | od -c
0000000 033 006 033   E   f   o   o  \r  \n  \n  \n  \n  \n  \n  \n  \n
0000020  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000100  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n 033 006
0000114

(Here you can enjoy a laugh at my expense as my fumble-fingered typos
are recorded for posterity much as they were on actual Teletypes.
Recall that on PDP-11 Unix, the terminal driver used '#' for erase and
DEL (^?) for keyboard interrupt.)

It appears that the sequence 033 E enabled boldface on the 300S.  It was
not achieved with overstriking, contrary to my expectations.  (But then
I know _nothing_ about this terminal device's operation.)

Next, let's have a look at "italics".

$ cat > 1lineI.roff
\fIfoo
.pl \n(nlu
$ nroff -T300s 1lineI.roff | od -c
0000000 033 006   _  \b   f   _  \b   o   _  \b   o  \r  \n  \n  \n  \n
0000020  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000100  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n 033 006
0000120

Italics are achieved by overstriking with ASCII underscores, a mechanism
recognized by less(1) even today.

Now let's try a style change, going from italics to roman.

$ cat > 1lineIthenR.roff
\fIfoo\fRbar
.pl \n(nlu
$ nroff -T300s 1lineIthenR.ro | od -c
0000000 033 006   _  \b   f   _  \b   o   _  \b   o   b   a   r  \r  \n
0000020  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000120  \n 033 006  \n
0000123

(I omitted an ls(1) command where I was reminded about the 14-character
limit of file name components in Unix V7.)  There is no surprise here.

Let's see how the device gets out of boldface.

$ cat > 1lineBthenR.ro
\fBfoo\fRbar
.pl \n(nlu
$ nroff -T300s 1lineBthenR.ro | od -c
0000000 033 006 033   E   f   o   o 033   E   b   a   r  \r  \n  \n  \n
0000020  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000100  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n 033
0000120 006  \0
0000121

Another expectation overturned.  ESC E is apparently a toggle.

But wait.  Maybe bold mode has to be refreshed every n characters or
something; maybe it expires for some reason.  (Terminals have done
dumber things.)  Let's see.

$ cat alphabet.roff
\fBabcdefghijklmopqrstuvwxyz
.pl \n(nlu
$ nroff -T300s alphabet.roff | od -c
0000000 033 006 033   E   a   b   c   d   e   f   g   h   i   j   k   l
0000020   m   o   p   q   r   s   t   u   v   w   x   y   z  \r  \n  \n
0000040  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000140 033 006
0000142

If there is such a limit, we're not likely to trip it with our inputs.

What if we change between both non-roman styles?

$ cat > 1lineBthenI.ro
\fBfoo\fIbar
.pl \n(nlu
$ nroff -T300s 1lineBthenI.ro | od -c
0000000 033 006 033   E   f   o   o 033   E   _  \b   b   _  \b   a   _
0000020  \b   r  \r  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
0000040  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000120  \n  \n  \n  \n  \n 033 006  \n
0000127

No surprise here.

Let's bring in the man(7) package, throw Ingo's example at this
terminal, and see what happens.

$ cat > 1lineBtrap.man
.TH foo 1
.B bar
baz
.pl \(nlu
$ nroff -T300s -man 1lineBtrap.man | od -c
0000000 033 006  \n  \n  \n   f   o   o   (   1   )
0000020                                     006             033 006   U
0000040   N   I   X       P   r   o   g   r   a   m   m   e   r   '   s
0000060       M   a   n   u   a   l
0000100                     006             033 006   f   o   o   (   1
0000120   )  \r  \n  \n  \n  \n                     033   E   b   a   r
0000140     033   E   b   a   z  \r  \n  \n  \n  \n  \n  \n  \n  \n  \n
0000160  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000220  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n   P   r
0000240   i   n   t   e   d       9   /   2   2   /   8   8
0000260
*
0000320                                                           1  \r
0000340  \n  \n  \n 033   h 006  \n  \n  \n 033 006 006  \n 033 006 033
0000360 006  \0
0000361

We see ESC, ACK sequences much more now, and also some isolated ACKs.  I
can't account for these.  It's _possible_ they invalidate any
conclusions I might draw.

But apart from that nothing looks surprising.  The man(7) header and
footer are there.  The body of the man page contains "bar" in bold (I'm
already using "foo" for the page name, and didn't want to confuse my
weary eyes), then a space, then "baz" apparently in plain roman.  _That_
is exactly what we would expect.

Now for the moment of truth.  Stick a `\c` after the "bar".

$ cat > 1lineBtrapcont.man
.TH foo 1
.B bar\c
baz
.pl \(nlu
$ nroff -T300s -man 1lineBtrapcont.man | os#d-# -c
0000000 033 006  \n  \n  \n   f   o   o   (   1   )
0000020                                     006             033 006   U
0000040   N   I   X       P   r   o   g   r   a   m   m   e   r   '   s
0000060       M   a   n   u   a   l
0000100                     006             033 006   f   o   o   (   1
0000120   )  \r  \n  \n  \n  \n                     033   E   b   a   r
0000140 033   E   b   a   z  \r  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
0000160  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000220  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n   P   r   i
0000240   n   t   e   d       9   /   2   2   /   8   8
0000260
*
0000320                                                       1  \r  \n
0000340  \n  \n 033   h 006  \n  \n  \n 033 006 006  \n 033 006 033 006
0000360

Our output stream is one byte shorter; the space between "bar" and "baz"
has disappeared.

"baz" remains roman.

This is enough to push me fairly hard toward a conclusion that it was
_not_ correct to use `itc` with groff man(7)'s input traps for macros
like `B` and `I`.  The change was made about 5 years ago.  That
apparently no one has complained about this in the 3+ years groff 1.22.4
has been out suggests that reverting it will not be too disruptive.

There is a practical reason to believe that too; if you need to join
together two different styles, man(7)'s font alternation macros (BI, BR,
IB, and IR [the other two would not see use in this scenario]) are the
much more obvious tool for the job, and are heavily attested in extant
man pages[1].

In other words, you wouldn't say:

.B foo\c
bar

You'd say:

.BR foo bar

There are other input trap users in groff man(7), though.

SM, SB: These are relatively rarely used (except in historical SunOS man
pages).  They are indistinguishable from regular roman and bold text,
respectively, on nroff devices, namely the terminal emulators that most
people use to view man pages.  However, their use with \c is more likely
because there are no alternation macros for type size.  I do remember
one real-world use of \c with a small font macro, that being the ksh93
man page[2].  Here's what it did.

.SS Field Splitting.
After parameter expansion and command substitution,
the results of substitutions are scanned for the field separator
characters (those found in
.SM
.B IFS\^\c
)
and split into distinct fields where such characters are found.

Here, the page author was clearly _not_ expecting `itc` semantics.  We
can be confident that they wanted the closing parenthesis to match the
opening one in size.  The 1/12th em horizontal motion looks like
detail-oriented typography to me, to keep the closing parenthesis at a
larger type size from crowding the smaller "S" preceding it.

So, groff 1.22.4 rendered their page worse--if someone looked at it with
a troff device instead of a terminal.

Consequently, the input traps for `SM` and `SB` should be switched back
to `it` as well.

That leaves `SH` and `SS`.  I can't think of a reason not to revert
these to `it` as well.  First, it is vanishingly rare for man page
authors to knowingly leverage the input traps of these macros in the
first place.[3]  Second, it makes very little sense to apply any of the
other input-trap-using macros to a section or subsection title.  The
typeface and size of these headings is already under the control of the
macro package (though groff man(7) exposes registers and strings to
parameterize them at rendering time to suit the taste of the reader).
Using `TP` with `SH` or `SS` would be a hideous structural violation and
should not be supported under any circumstance.  Blissfully, I haven't
seen even docbook-to-man produce such an acrid emesis.

I therefore propose to proceed as follows.

1. Move all of B, I, SM, SB, SS, and SH back from `itc` to `it`.
2. Keep TP right where it is because `itc` does good there, as
   originally discussed in 2017.  The fix simply got carried away due to
   lack of understanding (not least my own).[4]

Any objections?

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff/2017-05/msg00066.html
[2] https://lists.gnu.org/archive/html/groff/2017-04/msg00027.html
[3] I say "knowingly" because the traps are unconditional.  What happens
    with these macros (except TP) is that if they are given arguments,
    those arguments are immediately placed on the output by the macro
    itself, springing the input trap.  Thus, by the time the macro
    "returns", the input trap has already disappeared.
[4] https://lists.gnu.org/archive/html/groff/2017-05/msg00019.html

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]