[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Eric Raymond on groff and TeX

From: Steve Izma
Subject: Re: [Groff] Eric Raymond on groff and TeX
Date: Sat, 5 May 2012 21:25:50 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

On Thu, May 03, 2012 at 01:41:57PM -0400, Eric S. Raymond wrote:
> Subject: Re: [Groff] Eric Raymond on groff and TeX
> The rift between troff and DocBook-XML is that in troff, structural
> markup is a rather strained and unnatural style that can never really
> cover over the fact that the interpretation engine underneath is a
> *typesetter*.  This is particularly clear near, for example, font
> changes.
> Because I wrote doclifter, which translates troff macros to DocBook
> structural XML, I understand the width of this rift probably better
> than *anyone* else. It is not a minor crack that can be papered
> over with clever macro definitions, it's a huge gaping chasm that has
> swallowed hackers whole in the past.

I assume that the above was necessary in order to rescue legacy
documents: to preserve the content but to add structure so that
the content can be presented on various media, possibly with a
choice of typographical tools, including some not yet developed.
It's too bad that we need to worry about this (but we certainly
do), but I think the real issue is moving forward and improving
the way we prepare and store our source documents. There's
nothing about the groff suite of tools that stands in the way of

Granted docbook is noisy, but a source text using groff requests
is no less noisy and certainly less flexibile.

But there are many other ways of adding structure to text that
are not only more readable but are simple enough that they don't
get in the way of the writing and thinking process, e.g.,
wikitext or wiki markup. You can write stuff using these
techniques then convert the files to XML after you finished with
the creative part.

The SoftQuad people in Toronto during the 1980s put a lot of
effort not only into rewriting ditroff (i.e., the code they
licensed from AT&T) but also in getting it to work with SGML.
They made most of their money by actually doing documentation
projects -- not only designing the SGML used for the source texts
but also the troff macros needed to format them. They didn't rely
on MS. ME, etc., but re-thought how macros could handle an SGML

Peter's examples from MOM show how that can be done (except for
one problem: you need to signal the close of a paragraph, and
none of the groff macros I've seen do that).

But also I think we need to forget about monolithic macro
packages. DTDs and schemas exist because of the huge variety of
documents that need to be typeset. Even half-a-dozen "monolithic"
tmac files won't cover that territory.

Inspired by what SoftQuad was doing (and using their software for
many years), I started using troff on structured documents around
1988. At first the tagging was a hybrid of SGML and troff
converted by awk scripts to correct troff source, then by the
late 90s I was using python to parse and feed XML files to groff.
Working for a scholarly press, I have processed scores of books
and journals in the meantime.

I use a set of tmac "library" files, each of which handles a
particular function -- titles, subheads, footnotes and endnotes,
block quotations, lists, pagination, tables (which processes
CALS-style input, since I've never been able to reconcile XML
with tbl, unfortunately), floats, indexes, tables of contents,
etc. The actual design of a book or journal or document is
contained in a parent tmac file that calls up (using .so)
whatever library files are needed. The parent tmac sets variables
for the parameters of all the elements needed (similar to a CSS
file) and modifies (subclasses, in a sense) any of the library
macros if needed for a particular document's design needs.
So the library files can be consistent from project to project
but the main tmac file is always customised for the project.

Unfortunately, my macros have built up too much cruft over the
years, os over the last few months I've tried to completely
overhaul them and bring them as close as possible to a simple and
clean method of taking an XML document and utilizing all the
typographic power of groff to produce readable output. Since
groff is a filter, it does this much more elegantly (and quickly)
than TeX.

One of the biggest hassles I've always had to deal with in
typesetting SGML docs is the difference between a block element
and an in-line element. They really bear little relationship to
each other structurally. A block, e.g., a paragraph or a subhead,
can be easily parsed and separated from the stream for
processing. Any white space at the beginning or end of such a
block can be discarded and the proper spacing decisions turned
over to the macro definition. An in-line element (emphasis, small
caps, superior numbers) not only needs surrounding white space
(or lack of it) detected and preserved, it also breaks up the
enclosing block, leaving a tail (depending on the kind of parser
you're using). So far I have always needed to detect and define
separately whatever in-line elements a document uses, which
means that writing a general-purpose formatter for XML seems
virtually impossible.

Anyway, I don't think we'll get anywhere arguing that using troff
codes in source documents is forward-looking. At the same time I
see nothing being developed that comes close to groff's
typographical abilities. The so-called justification routines
used in ePubs is horrendous (they don't bother with hyphenation,
unless the broken word from the original print version has been
inadvertently left in).

        -- Steve

Steve Izma
Home: 35 Locust St., Kitchener N2H 1W6    p:519-745-1313
Work: Wilfrid Laurier University Press    p:519-884-0710 ext. 6125
E-mail: address@hidden or address@hidden

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]