[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

an opinionated history of *roff macro packages (was: pdfroff in groff 1.

From: G. Branden Robinson
Subject: an opinionated history of *roff macro packages (was: pdfroff in groff 1.23.0.rc3 changes compared to 1.22.4)
Date: Thu, 6 Apr 2023 00:33:12 -0500

[dropped Peter from CC; I'm sure he'll find one copy of this enough]

Hi Michał,

At 2023-04-05T18:13:16+0000, Michał Kruszewski wrote:
> I have once evaluated ms, mm and mom.  I have come from the Latex
> world after being sick of its bloat.  I was looking for something
> simple.  I know some differences between ms, mm and mom, but I do not
> really understand why people did not want to cooperate to create a
> single macro package and single program.

The reasons are mostly historical and organizational.  Adding "me",
"man", and mdoc to the above list, I'll offer a summary.  Some of this
is grounded on my absorption of historical documents and some is
reckless speculation with a conspiratorial bent.  I am a first-hand
observer of practically nothing discussed here.

ms, written by Mike Lesk, came first, in Version 6 Unix (1975).

man(7) came next, in Version 7 Unix (1979).  While man page documents
date back all the way to First Edition Unix (1971) (and the basic format
even farther than that, apparently, to Multics documentation), they did
not get a set of macros designed for them until Version 7.  What came
before can be found in the archives of TUHS; whether that constitutes a
"macro package" may be a matter of argument, and I haven't researched
them myself.

Doug McIlroy designed and implemented man(7), and subscribes to this
list.  He is thus best positioned to address why man(7) was developed
instead of routing man page composition through ms(7).  But I can guess.
ms was born with typesetting in mind, and man pages needed to be
formattable on the Teletype machines used as Unix time-sharing
terminals.  There is also the matter of execution speed.  When a person
at a terminal wants a man page, they want it fast.  Today even the most
complex man pages format in time intervals below the threshold of human
perception, but that wasn't the case in the 1970s nor for many years
afterward (thus the now obsolescent phenomenon of "cat pages").  I
suspect that there was also a general understanding that man pages could
(and should) be written by people who otherwise did not concern
themselves with the construction of typeset documents.  There were
reasons, then, to construct a domain-specific macro package for man page
documents.  The package was clearly inspired by ms(7), as many of the
macro names are the same, with (originally) two macros `LP` and `PP`
that did exactly the same thing (starting a new paragraph), and not
indenting the first line differently, as ms's distinct macros did.
`IP`, `RS`, and `RE` also "port" in the document writer's mind between
ms(7) and man(7).  Some macros share names but behave a little
differently, as with `B`, `I`, and `SH`.  I've been tripped up by those
small differences occasionally.

mm(7) also shares many macros in common with ms(7), and in many ways
matches ms(7)'s behavior more closely than man(7) does.  But it also has
a _lot_ more macros.  Also, both ms(7) and mm(7) come from AT&T.  So why
do we have both?

I think the answer is corporate structure/politics.  My inference from
reading anecdotes and a variety of historical Unix docs, Brian
Kernighan's memoir, and a scanned copy of a Bell Labs CSRC office
personnel list (complete with telephone extensions!) is that the ms/mm
bifurcation arose from the organizational distinction between CSRC (the
Computing Science Research Center) and USG (the Unix Support Group).  At
the corporate level, AT&T desperately wanted to make money selling Unix,
and through most of the 1970s this was difficult because of an old U.S.
antitrust legal case that forbade them from going commercial outside the
telephone industry.  However, AT&T lawyers repeatedly tested the waters
by charging higher and higher convenience fees (like TicketMaster/Live
Nation has in more recent years) for Unix source licenses throughout the
'70s, and apparently drew no official rebuke from the Federal Trade
Commission.  By 1980 it was clear that a revanchist wave of social and
economic conservatism was breaking over the country--Congress and
Democratic President Jimmy Carter had already deregulated the airline
industry two years earlier--and that laws against the exercise of
monopoly power and restrictions on rentiers of every sort would be
tumbling.  Ronald Reagan (Carter's main electoral opponent that year)
had a campaign team that made a deal with the new (and avowedly
U.S.-hostile) autocratic, theological regime in Tehran to hold on to
some American hostages from our embassy just a bit longer to keep
Reagan's opponent Jimmy Carter from claiming a PR win in election
season[1]--though that might not have been enough to keep him in office
regardless, as Federal Reserve chairman Paul Volcker had been "hitting
the economy over the head with a sledge hammer" to combat inflation.[2]

But I digress.  My supposition is that forces within USG wanted to add
features to ms(7) to support more of their own needs--though this may
have been couched in terms of better supporting "users"--and that
Research Unix didn't want to be bothered with such things, since their
business was _research_.  For them, ms(7) was a tool for writing journal
papers.  For USG, a macro package was a terrific tool for increasing the
volume of cool-looking inter-office mail.  Thus USG decided to "fork"
ms(7) and support their own package, mm(7).  The names on the original
mm documentation from 1980 are D. W. Smith, J. R. Mashey, E. C. Pariser,
and N. W. Smith.  I don't recall ever reading any interviews/emails with
any of these people about mm.  (People have sought out John Mashey to
discuss his shell, the immediate predecessor of the Bourne shell.)  It
might be a good idea to get their perspectives documented, along with
Mike Lesk's before the pass away.  (Eric Allman has told the story of
me(7)'s origin at least once.)  It is not clear to me that AT&T
commercial Unix even continued to ship ms(7), though there are certainly
some people on the TUHS mailing list who could tell you.  When we look
at the features that USG actually added to distinguish mm(7) from ms(7),
we see some conveniences and a lot of highly particular stuff for
composing AT&T official documents (some of which groff mm supports, but
much of which it doesn't, and the lack of which we don't seem to get
complaints about).  Personally I think a lot of this comes from
executives insisting on "getting the icon in cornflower blue".  This
sort of micromanagement might explain why the DWB 3.3 mm manual credits
no authors at all.

Meanwhile, as the 1980s dawned the University of California at Berkeley
was spinning up an operating systems research organization that would
come to rival the Bell Labs CSRC in notoriety.  (And for those for whom
this is the sole criterion of merit, the CSRG was affiliated with at
least one billionaire, Bill Joy, whereas as far as I know the CSRC is
not.[3])  Ken Thompson had done a sabbatical at Berkeley and a thousand
flowers bloomed from what he left behind.  Initially there was much
cross-pollination between Berkeley's CSRG and Bell's CSRC, but over time
the relationship appears to have become strained, perhaps due more to
organizational issues and/or the non-stop ratcheting up of Unix license
fees by AT&T.  The latter's leadership appears to have been frustrated
with Berkeley for distributing its own work gratis to anybody who
already had an AT&T Unix license, instead of bottling up their nice new
features and bug fixes so that AT&T could make more money selling those
same people System III or System V or whatever.

The fruits of this fraught relationship can be seen in the fact that
4.2BSD (August 1983) shipped with some extensions to the ms(7) package.
But Research Unix didn't take them and, as noted above, I'm not sure
AT&T commercial Unix kept shipping ms(7) at all.  Into this
collaborative void, a Berkeley undergraduate named Eric Allman came
along and wrote a macro package that the local system administrators
decided to name "me".

So AT&T and Berkeley Unices were fighting with each other all through
the 1980s, which led to a legendary lawsuit[4] establishing that (1)
people who shout the loudest about ownership and copyrights are often
the poorest stewards of copyright and the lousiest keepers of records of
ownership and (2) they would prefer to hold their counterparty to a
non-disclosure agreement than permit fact (1) to come to public light.

This growing enmity was terrible for *roff development, tragically so
because Kernighan's device-independent rewrite of it circa 1980
positioned it really well for the laser printer/desktop publishing
revolution.  But Kernighan either didn't have the power to free its
source code or didn't want to die on that hill.

Meanwhile, a guy named Brian Reid wrote a typesetting system called
Scribe that was proprietary but which won a lot of admirers, including
Richard Stallman, and by extension the rather frothy Texinfo community
(witness recent messages from Eli Zaretskii on the help-texinfo mailing
list).  And another guy in California named Donald Knuth produced a
phenomenal achievement of software engineering that produced more
diagnostic output than a human could read in a lifetime, written
employing literate programming techniques in a language that, in spite
of some technical flaws, was much more readable than most of its
competitors.  But it was (more or less) freely licensed, and gratis, so
a vibrant community rapidly sprung up around it; two of the first things
this community did, as far as I can tell, were to get rid of literate
programming and the readable programming language.  For the win!

In 1989-1990, James Clark wrote and released groff.  It wasn't
literately programmed nor implemented in a readable programming
language, but was assuredly free[5] and gratis.  But by this time much
of troff's lunch had been eaten by TeX.

groff was pretty successful, and many of the remaining users of Unix
troff threw it over in favor of the GNU implementation.  This was aided
by groff's aggressive absorption/reimplementation of some Sun extensions
to man(7)--since Sun workstations were phenomenally popular among the
sorts of Unix nerds who spent their entire lives at universities--but
more important in my opinion was groff's embrace of a great many
extended features from sqtroff, a now nearly forgotten descendant of
Unix troff produced by a Canadian company called SoftQuad.[6]

But not far into the 1990s, as groff's star rose in the limited skies of
the Unix world, Microsoft made a play to kill Unix, while at the same
time its brilliant, visionary founder with an unerring ability to
predict the future,[7] failed to anticipate the importance of the Web as
an application (in the OSI model sense).  So a whole lot of Unix
developer energy was directed toward adoption and improvement of the
Linux kernel and BSD systems as "back-end" platforms for "delivery of
content", and into the development of skill with tools for the
presentation of that content in a Web browser.  Initially, this meant
presenting HTML.  And it's not easy to get a *roff to turn out HTML, in
part because HTML's original design was pretty dire.
<MARQUEE><BLINK>Worse is better!</BLINK></MARQUEE>

This narrow, obsessive focus on Linux and BSD solely as a network
switches and web content delivery engines, rather than as development
environments (for which Unix was originally purposed) or as platforms
for knowledge workers in general (those benighted souls who devote most
of their labor to the absorption, analysis, and composition of natural
languages rather than machine-interpretable ones) sucked a lot of energy
away from *roff, and much of what remained got funneled into TeX.  With,
perhaps, some of the consequences of which you complained above.  But
the main outcome in *roff macro package land was that groff maintained
its reimplementations of AT&T's man(7), ms(7), and mm(7), and, thanks to
the free licensing, adopted BSD's me(7) and mdoc(7)...oh, I forgot to
cover mdoc(7).

Okay, well, that project documents its own history amply,[8] but in a
nutshell, the Berkeley CSRG decided that man(7) sucked, mainly, I think,
because it lacked semantic tagging.  Importantly, they realized this way
before Tim Berners-Lee did.  Cynthia Livingston took 2 cracks at solving
the problem of writing a macro package for composition of semantically
oriented man pages.  (I could be mistaken here, and that someone else
wrote the first one, now called "old mdoc".)  mdoc(7) caught on like
wildfire in the BSDs and not so much anywhere else, though you will find
the occasional champion of it elsewhere.

In an echo of your original (implied) question, the historical reasons
for the multiplicity of BSDs are also worth pursuing.  And since a huge
email archive of Theo de Raadt's fight with Charles Hannum is available
on the Web, it's more authoritative, more entertaining, and even less
edifying of human nature than my account here.

But we can end our story on a positive note!

In 2002, Peter Schaffter looked at the state of groff, decided it was
too damned hard to learn (it may be), that the existing macro packages
had been stagnating for years (they certainly had) and determined to
solve both problems at once by writing a new macro package that was sui
generis and went to great lengths not to document itself in terms of the
underlying formatting engine.  Peter has told me (correct me if I
misstate this) that he wishes he'd had my improvements to groff's
documentation when he first encountered it, but after observing how much
work it has required, he'd still have taken the route he did.

Certainly I find the examples of mom's output that we ship with groff to
be impressive.  And I think other people will too, if they just look.

So check out $DESTDIR/share/doc/groff/examples/mom sometime.

(s/groff/groff-base/ on Debian systems)

> The *roff community is rather small.  Dividing it by providing
> multiple packages doing more or less the same, or implementing
> multiple programs (groff/pdfroof for example) is not probably the
> right move.

pdfroff is a wrapper, but as I noted recently regarding its (lack of)
support for groff's "-a" option, it is not a perfect one.

Then, too, Ingo Schwarze has opined about the dubious wisdom of having
wrappers around wrappers.  groff(1) is itself a wrapper.

> I do not want to learn and use ms, mm or mom depending on the type of
> the document I write.  My impression was that ms is the most minimal
> and the simplest.

Of those three, yes--ms is the simplest.

> I can easily extend ms by defining my own macros or by writing
> Perl/Python scripts.

As far as I know, that's true of all three (plus me(7)).

> Current pdf support in ms is far from being perfect.

Yes, regrettably.  I _really_ want to improve this in the groff 1.24

> However, I hope that one day it will be obvious that groff + ms is the
> way to go.

I don't have any ambition to bring groff ms into feature parity with
mom.  I think that would be the sort of waste of effort you lament.  My
objectives for the "historical" macro packages are to:

(1) correctly render correctly composed historical documents using these
    packages (except where we disclaim interest, as with the proprietary
    markings and corporate logo support in mm);
(2) support rendering to hyperlink-capable output formats (HTML and PDF)
    with reasonable, basic support for hyperlink features (so, actual
    hypertext in both formats, and a well-realized contents pane for
    bookmarks in PDF); and
(3) support composition of new documents that don't demand features not
    covered by the above points.

For example, for me it's an anti-goal to add macros for drop caps to ms,
mm, or me(7).


[3] Because he got rich at Sun Microsystems, Joy is frequently credited
    with innovations he didn't personally make.  See, e.g.,
    "Relationship with vi",
[4] USL v. BSDI
[5] unless you have a BSD brain that regards copyleft as a parasitic
    virus that frustrates your otherwise inevitable status as the next
    Bill Joy

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]