[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] mdoc considered harmful

From: Ingo Schwarze
Subject: Re: [Groff] mdoc considered harmful
Date: Fri, 7 Mar 2014 21:55:55 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Eric,

Eric S. Raymond wrote on Fri, Mar 07, 2014 at 02:14:24PM -0500:

> I've written an mdoc interpreter.  It's in doclifter. And I'm here 
> to tell you why mdoc is not the solution you're looking for.

Well, i already solved the problem with it, so you are somewhat
late in warning me...  :-D

> It's way, *way* overcomplicated.

Now you are exaggerating.

> Part of the reason is design bloat.

There is a small amount of design bloat, mostly regarding the
text production macros like .Ux.  But that's easy to not use
and it doesn't cause complexity.  Even the name space pollution
caused by that is minor.

The core features are not bloated at all, but instead quite simple.

> Part of the reason is attempts to paper over intrinsic problems with
> the line-oriented model of groff markup.  Attempts that don't quite
> work, inducing cascades of mdoc features that are in reality ugly
> workarounds (I'm thinking especially of the .O/.X macro families here).

Yes, .Xo/.Xc is slightly ugly, but you don't have to use it if
you don't like it.  You can do all you need with simple line
continuation (`\' right before EOL) if you prefer that.
Besides, .X* is not a family of macros, it's just one single block type.

Regarding .O, i guess you mean the .[ABDPQS]o explicit block macros
(enclosures).  Those are quite useful and simple in practice,
certainly not something that "doesn't quite work".

> The result is pure hell for anyone trying to interpret the mess with
> anything but groff itself.

It is not completely trivial to parse correctly,
but you are exaggerating massively here.

Besides, somebody writing a parser is expected to do a bit of
work, that is completely normal.  What matters is that the language
is easy to use and has considerable expressive power, without
unnecessary complication for the document authors.

> I believe I am the only person who has even tried this seriously.

Most definitely you are not.  Kristaps has written one, with
quite some contributions from Joerg Sonnenberger and myself,
and i'm actively maintaining it:

That thing has been usable and stable for more than three years
now, and basically, all -current BSD systems are using it by now.

> I managed to handle almost all of it, because I am exceptionally good
> at the kind of hacking required for the job. But not in fact all of it;
> it's one of the major sources of the tiny percentage of pages that
> doclifter chokes on and that cannot be fix-patched.

We are handling *all* of it.

Maybe i should have a look at doclifter and provide hints
or send patches...  :)

The main challenge was *not* parsing, but getting the output
byte-by-byte bug-compatible with groff.  But even that works quite
well.  Right now, on a corpus of 3967 manuals, the output is
byte-by-byte identical to groff for 2636 (66%) of them.  Most of
the remaining differences are white space (one blank character or
blank line more or less here or there), the average is less than
two different output lines per manual.  There are about 7000 different
output lines in about 670,000 lines of manual output, i.e. just
above 1%, most of which are whitespace differences.

Compare that to you being able to merely *parse* 93% of man(7)
pages, as opposed to parsing 100% and formatting 99% of the
lines identically to groff in mdoc(7).

Actually, we spent much more time on man(7) parsing than on mdoc(7)
parsing, because of the low-level roff(7) requests usually embedded
there.  Compared to man(7) parsing, mdoc(7) parsing is mostly a piece
of cake, with very few exceptions.

> The effort required to get this far with mdoc was extreme even for
> me. Thus I consider that effort very unlikely to be successfully
> replicated - I doubt anyone else will have the stamina required.

That is funny.  :-)

Including ASCII, UTF-8, HTML, XHTML, PostScript and PDF output,
including a man(7), an mdoc(7), a tbl(7) and a basic eqn(7)
parser and an mdoc(7) to man(7) converter, and including
a makewhatis(8)/apropos(1) style database/semantic search suite,
mandoc(1) right now totals 32,900 lines of code in .c and .h files.

> mdoc has overelaborated itself into a hole.
> It is an evolutionary dead end, not a solution. 

I see no indication of that in the real world, and i don't
understand your theoretical arguments what might be bad about it,

But maybe we don't need to decide that in this forum.  mdoc(7)
is out there, and works, and is the system of choice for most
BSDs, with the exception of FreeBSD.  No matter how you judge
its quality, it is safe to assume that it will remain a major
contribution to the user base of roff macro formats, thus
increasing awareness that roff macro languages are still alive,
to come back to the point we were discussing.

DocBook, on the other hand, is not only likely to be more bloated
than mdoc(7)/mandoc(1), but also less likely to contribute to groff's
popularity...  ;-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]