Re: A few observations regarding tbl

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A few observations regarding tbl

From:	Oliver Corff
Subject:	Re: A few observations regarding tbl
Date:	Thu, 17 Jun 2021 20:31:38 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1

Hi Branden,

thank you for your encouraging response!

I think it should be perfectly possible to put a handful of these tables
together with their sources into a tbl showcase file. The material is
taken from a series of government-published books between 1953 (no troff
yet) and 1985 (obviously employed troff to typeset the whole thing). A
single figure/illustration/table can always be regarded as a quote in
scientific terms if the source is properly attributed. For that
particular purpose, I am ready to supply my tbl solutions.

As far as tbl features (or elements of the language) are used: actually,
I was quite modest in exploiting tbl's offerings, and haven't really
scratched below the surface. I made wide use of horizontal and vertical
spans, as well as horizontal lines covering only portions of the
columns. I employed font changes, made ample use of table continuation
requests, but that's about it. I virtually did not use column width
specifications at all (with a few exceptions), but I always defined tab
as '|' because that, together with appropriate whitespace padding,
provides for a smooth editing experience and was of great help to spot
conceptional mistakes in my input files.

I also have to confess that not every single table I (re)produced is a
perfect match of the original. Sometimes, in some tables, the column
widths are slightly different (I attribute this to font metrics
differences), and some other tables ended up much wider than in the
original book, but still clearly conveying the same ideas; so I didn't
force myself to achieve visual identity, if everything else was just
what the original authors intended.

I identified at least one feature the use of which (or not) in the
original material cannot be determined with any certainty. Let's assume
you have cells with multi-line contents, then you can either build small
paragraphs with T{, T}, or go the lazy way and hard-code every single
line. In the beginning I went the T{} way but was not always satisfied
with the breakpoints chosen by the algorithms; given that this also
depends on the text width of the text holding the .TS .TE material (some
parts of the book are in one-column, others in two-column mode; not
arbitrarily, though, but due to clear editorial decisions). Facing this
problem, I later opted for hard-coding line-breaks and reserve as many
lines in the tables as needed.

Another example where tbl's suggested solution and the series of texts
are visibly different is the use of 'a' as a column identifier for
indented material, like in sub-categories to something. This, again,
cannot be conjectured from the sources with certainty. I later abandoned
'a' as a column identifier altogether and resorted to '\0 ' as indent
aid, producing pleasing results in many cases, and results which came
much closer to the original printed presentation than my attempts using 'a'.

Another area where I could not achieve total similarity with the
original were tables with extensive notes. The width of the text block
(which I kept within the .TS .TE environment, doing something like

.T&
l s s s s s s s
And then the text enclosed in a T{ T} area
.TE

did not match the width of the table. Sometimes the text below is much
wider, stretching the table artificially, sometimes the table is much
wider than the text block, which does not seem to see how much space it
is offered.

Since this particular aspect of the reproduction is utterly
insignificant for my particular publication purpose (the material is
supposed to appear online), I did not care to fix it.

Another area where I failed to reproduce the exact appearance of the
text sources were huge curly braces spanning several rows or columns. I
finally considered it sufficient to achieve a similar result using math
braces (via inserted eqn code.

Finally, in some of the tables I could not discern whether the original
authors abused pic with insertions of tbl, or vice versa. I quickly
decided that it is much better reproducing scans of these "tables" as
there are hundreds of illustrations, organizational diagrams etc. only a
small portion on the latter years seems to have been done with pic. When
discussing the original publication proposal, the concensus emerged that
direct scans of these materials convey best the "Zeitgeist", i.e. the
spirit of the era. Actually, the style of the graphical displays changes
a lot over the years of this book series, and direct scans are the best
testimonial of that.

My current publication roadmap is targeted at a www environment, and
prints on paper are mainly for proofs. Since I was made aware of the tbl
-> html capabilities of mandoc quite recently only (essentially after
everything had been done), my steps were as follows:
1. compile all tables into pdf, using a simple wrapper which can
accommodate different paper sizes (some tables spread over two to four
pages in the original books; I *do* know this makes ugly reading, but
that was the original editors' choice, not mine).
2. crop all pdfs down to the naked table, zero margin.
3. convert pdf to png via pdftocairo, using a fairly high resolution
(like -r 200 or even -r 300). Modern browsers can scale embedded
graphics automatically.
4. embed the png in the source -> html converted texts.

At this point, you will ask why I didn't use groff -Thtml and leave the
png production to the backend. Simple answer. This combination has the
nasty tendency to split tables into several png slices in a manner not
always predictable. For simple tables, it works, not so for large tables
with one or more .T& continuation requests.

Only after I found out how useful mandoc is, and with a good portion of
assistance from Ingo Schwarze who wrote a few bugfixes on the fly, ---
thank you, Ingo! --- I opted for another route. I ran a few tests which
features of my tables were acceptable as input to mandoc, and then wrote
a wrapper which takes each table, greps it for forbidden contents and
feeds it to mandoc only if it is "clean". Then, there is some
post-processing; mainly removing everything like <div>...</div> and
other materials. I just need the bare
<html><table><tr><td>...</td></tr></table></html>. The compiler which
produces the text output now has an additional check for all table
requests. If there is a html table of a given page and name, include in
the output, if not, take png instead, and if no png is there, complain
bitterly.

I amusing gaffe of mine at the very beginning of this undertaking was to
produce footnotes within tables, for the purposes of annotations. I
struggled hard with the various footnote macros of different macro
packages, until I gave up on the idea of literal footnotes and employing
superscript numbers in the table body, with text in the table footer.

Altogether, off and on, with all experiments, reasoning and probing of
possible options, I spent about a year and a half on this issue, from
first successful table to finalizing the last huge table.

I include four examples of different degrees of complexity. The
similarity of these with the originals is striking, and even if there
are different column widths, etc., they clearly convey the spirit of the
original.

For a tbl showcase text, I could also provide scans of the original
material.

Best regards,

Oliver.




 On 17/06/2021 18:46, G. Branden Robinson wrote:

Hi, Oliver!

At 2021-06-15T12:39:02+0200, Oliver Corff wrote:

my huge text project which involved typesetting approx. 1,300 tables,
tiny, small, large and huge, demonstrated that tbl is a remarkably
powerful and reliable tool for this work, and I can say with
confidence that the question which type of table software to use
(LaTeX? (x)html?  others?) was best answered by tbl which helped me
recreate tables with a fidelity so close to the printed sources that
the uninitiated reader could not tell an image of the page from the
typeset reproduction.

That's excellent news!

What is the copyright licensing status of these 1,300 tables?  Is there
a chance we could get a small, potentially simplified subset of them
under a FLOSS license so that we could use them to illustrate GNU tbl's
feature set?  An excellent property of Lesk's tbl paper was the suite of
examples, but we don't have that document in our distribution and the
few examples in our tbl(1) man page compare poorly.

Speaking of the feature set, how much of GNU tbl's feature set do you
figure you ended up exercising by the end of this project?  Was there
anything that you expected to use but ended up not needing?

I came across a few very minor discrepancies between expected and
actual behaviour, though.

1) For the global option "tab(x)", the man page says:

     tab(x) Use the character x instead of a tab to separate items in a
line of input data.

This works as long as x is a 7-bit ascii character, it does not work
with utf-8 characters. E.g.: "tab(|)" (with the pipe symbol) works,
"tab(¦)" does not work and yields the message: "argument to `tab'
option must be a single character".

I suggest either specifying "7-bit ascii character" in the manpage
and/or make the tbl parser utf8-aware.

Hmmm, yes--since tbl parses the table for itself, *roff special
character escapes will not serve as a workaround.  And UTF-8 support
would be a significant undertaking.

I've filed this as <https://savannah.gnu.org/bugs/?60790>.

2) The global option "nospaces", according to the manpage, is
described as:

     Ignore leading and trailing spaces in data items (GNU tbl only).

The following point may be a question of correct interpretation of
this statement. Does the underbar "_" qualify as a data item in this
terminology? I positively think so, because the manpage states

     If  a  data  line  consists of only ‘_’ or ‘=’, a single or double
line, respectively, is drawn across the table at that point;

If my data line consists of a single '_', that line is drawn. However,
if that '_' is followed by spurious whitespace, then only the '_'
appears in the first cell, and no line is drawn, or a line spanning
the first cell only is drawn. From a logical point of view, this is
clear, as the statement says "consists of only ...", but the nospaces
option does not seem to work here as expected.

Doug's follow-up to this point seems reasonable.  For me, it reinforces
the principle I espouse that diligent management of one's lexicon is one
of the most important things you can do in a software project.

When revising the tbl(1) man page in the future, I will attend closely
to the uses of the terms "data line" and "data item", and try to make
sure they're correct and consistent.

I once got partway through a rewrite of tbl(1) (the page) once, with
much terminological alteration around "global option", "column
specifier", and "column modifier".  I disfavor the term "global option",
because "global" options don't persist beyond a .TS/.TE table region,
not even in the same document.  I don't think novice users' concept of
something "global" stops anywhere short of the entire file they're
editing.

I ran out of steam on that project because there was just too damn much
I wanted to fix about the man page.  Not having a separate document (as
AT&T tbl had) to point the user to for practical examples was a major
problem, hence my request above.  Coming up with a good suite of
examples is itself a significant undertaking, and while I found the
examples contributed by Bernd to be contrived and meager, I couldn't
honestly say that they weren't better than nothing.

In my ideal world, tbl(1) would describe the syntax of the command and
its input (or the latter could be migrated to a tbl(7) page--I suspect
that would win Ingo's support and it wouldn't bother me at all), and
we'd have a separate tbl.ms document chock full of source alongside
rendered examples for users to emulate, experiment with, and build
their expertise with.

Regards,
Branden

1985_0743_Kooperationsverbaende.pdf
Description: Adobe PDF document

1985_1209_Sozialleistungen_Oeffentliche_Leistungen.pdf
Description: Adobe PDF document

1975_726_Reparationen.pdf
Description: Adobe PDF document

1975_943_Wirtschaft_Siedlungsstruktur.pdf
Description: Adobe PDF document

[Prev in Thread]

Current Thread

[Next in Thread]

A few observations regarding tbl, Oliver Corff, 2021/06/15
- Re: A few observations regarding tbl, G. Branden Robinson, 2021/06/17
  - Re: A few observations regarding tbl, Oliver Corff <=
  - Re: A few observations regarding tbl, Oliver Corff, 2021/06/18
    - Re: A few observations regarding tbl, T. Kurt Bond, 2021/06/18
    - Re: A few observations regarding tbl, Oliver Corff, 2021/06/18
    - Re: A few observations regarding tbl, Oliver Corff, 2021/06/19
    - Re: A few observations regarding tbl, T. Kurt Bond, 2021/06/19
    - Re: A few observations regarding tbl, G. Branden Robinson, 2021/06/19
    - Re: A few observations regarding tbl, T . Kurt Bond, 2021/06/19
    - Re: A few observations regarding tbl, G. Branden Robinson, 2021/06/19
    - Re: A few observations regarding tbl, Oliver Corff, 2021/06/20

Prev by Date: status of dev-gropdf-boxes (was: (A possible) way to change the background color of pdf)
Next by Date: Re: status of dev-gropdf-boxes (was: (A possible) way to change the background color of pdf)
Previous by thread: Re: A few observations regarding tbl
Next by thread: Re: A few observations regarding tbl
Index(es):
- Date
- Thread