[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #64360] [PATCH] [gropdf] does not correctly handle white space afte
From: |
G. Branden Robinson |
Subject: |
[bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command |
Date: |
Wed, 28 Jun 2023 13:07:29 -0400 (EDT) |
Follow-up Comment #22, bug #64360 (project groff):
[comment #21 comment #21:]
> A fair point, but it only addresses my second clause about CSTR #54. The
more relevant question is, do we claim CSTR #54 as a specification for groff,
aside from such noted deviations?
Not per se. I reckon the following, from _roff_(7), is the closest thing we
make to a claim along those lines.
Ossanna documented the syntax of the input language to the nroff and
troff programs in the "Troff User's Manual", first published in
1976, with further revisions as late as 1992 by Kernighan. (The
original version was entitled "Nroff/Troff User's Manual", which may
partially explain why roff practitioners have tended to refer to it
by its AT&T document identifier, "CSTR #54".) Its final revision
serves as the de facto specification of AT&T troff, and all
subsequent implementors of roff systems have done so in its shadow.
While I don't think we're bound to CSTR #54 with adamantium shackles, at the
same time I don't think we should do things that it implicitly prohibits
without good reason. That would create support grief for us and our users.
> (Maybe we do; you know the groff docs better than I do.) Because if
groff's _own_ docs don't address this whitespace in grout, and groff code in
its history has never placed whitespace here, one could argue this constitutes
an API change within the groff ecosystem.
For many years, _groff_ has documented this area of its behavior in the
_groff_out_(5) man page.
Unfortunately, I personally find that document (as of groff 1.22.4) less than
pellucid. The last time I complained about the quality of the writing there,
[https://lists.gnu.org/archive/html/groff/2023-06/msg00071.html Deri
interpreted it as a dig at a former contributor's non-native English], which
kind of leaves me in something of a bind with respect to further critique.
(I did express myself sloppily. I said "non-standard" instead of
"sub-standard" [not in itself a change likely to improve matters], but more
importantly left unstated whether I was referring to the English per se or the
technical writing. As a rule, I find the contributor in question to write
English of satisfactory syntax and vocabulary. My frustrations arise from its
frequent failures, in my opinion, to adequately and economically communicate
technical information. But this, too, could be interpreted as a personal
derogation, and if so, I remain constrained from conducting peer review to my
own standards. Our former maintainer Werner Lemberg on multiple occasions
expressed fairly strong critiques of the same contributor's work [citations
available upon request], so I am frustrated to receive scolding for doing the
same.)
To get back to concrete matters, here's what the page says in 1.22.4. I have
not yet undertaken a deep revision of this material.
Separation
Classical troff output had strange requirements on whitespace. The
groff output parser, however, is smart about whitespace by making it
maximally optional. The whitespace characters, i.e., the tab, space,
and newline characters, always have a syntactical meaning. They are
never printable because spacing within the output is always done by
positioning commands.
Any sequence of space or tab characters is treated as a single
syntactical space. It separates commands and arguments, but is only
required when there would occur a clashing between the command code
and
the arguments without the space. Most often, this happens when
variable length command names, arguments, argument lists, or command
clusters meet. Commands and arguments with a known, fixed length need
not be separated by syntactical space.
A line break is a syntactical element, too. Every command argument
can
be followed by whitespace, a comment, or a newline character. Thus a
syntactical line break is defined to consist of optional syntactical
space that is optionally followed by a comment, and a newline
character.
The normal commands, those for positioning and text, consist of a
single letter taking a fixed number of arguments. For historical
reasons, the parser allows stacking of such commands on the same line,
but fortunately, in groff intermediate output, every command with at
least one argument is followed by a line break, thus providing
excellent readability.
The other commands -- those for drawing and device controlling -- have
a more complicated structure; some recognize long command names, and
some take a variable number of arguments. So all D and x commands
were
designed to request a syntactical line break after their last
argument.
Only one command, `x X' has an argument that can stretch over several
lines, all other commands must have all of their arguments on the same
line as the command, i.e., the arguments may not be split by a line
break.
Empty lines, i.e., lines containing only space and/or a comment, can
occur everywhere. They are just ignored.
I'm attaching a screenshot of CSTR #54. In it you will see, after a listing
of the output commands, the statement by Kernighan that "Blanks, tabs, and
newlines may occur as separators in the input, and are mandatory to separate
constructions that would otherwise be confused."
That's not as rigorous as it could be, but I find it clearer and certainly
more economical than our own documentation.
The question for the purpose of this ticket is whether a newline is
permissible as a "separator[] in the input" with respect to the 'w' command as
with any other, or if it is not.
Deri appears to be claiming not simply the latter, but that such separation is
_prohibited_ when following the 'w' command, and I'm damned if I can find any
support for that view in either of the above.
I find it difficult to infer a prohibition from groff_out(5)'s phrase
"maximally optional".
> For the record, I'm not making an argument about this either way, just
trying to see it from multiple sides.
Fair. Maybe you can locate a perspective that un-damns me.
> > I think our output drivers should behave consistently,
>
> True, and they do behave consistently given the grout that groff has
produced to date.
This statement is not correct. _gropdf_ does not behave consistently with
_grops_ (and our other output drivers) with respect to inputs involving white
space after 'w' commands, and that is why I titled this ticket as I did.
GNU or macOS sed is required for the following.
$ echo 'A B' | groff -Tps -Z | sed 's/^w/&\n/' | grops | okular -
$ echo 'A B' | groff -Tps -Z | sed 's/^w/& /' | grops | okular -
$ echo 'A B' | groff -Tps -Z | sed 's/^w/&#comment\n/' | grops | okular -
$ echo 'A B' | groff -Tpdf -Z | sed 's/^w/&\n/' | gropdf | okular -
substr outside of string at /usr/bin/gropdf line 321, <STDIN> line 13.
Use of uninitialized value $lin in substitution (s///) at /usr/bin/gropdf line
325, <STDIN> line 13.
$ echo 'A B' | groff -Tpdf -Z | sed 's/^w/& /' | gropdf | okular -
$ echo 'A B' | groff -Tpdf -Z | sed 's/^w/&#comment\n/' | gropdf | okular -
There are three use cases above, each run through _grops_ and _gropdf_ (from
_groff_ 1.22.4) for a total of six commands.
Case 1 puts a newline after 'w'. _gropdf_ renders the output correctly but
provokes Perl diagnostics.
Case 2 puts a space after 'w'. _gropdf_ renders the output incorrectly,
ignoring the subsequent 'h' command and formatting 'AB' with no space between
'A' and 'B'
Case 3 puts a comment after 'w'. A comment in _troff_ output _must_ be
terminated with a newline to prevent misinterpretation. _gropdf_ handles this
correctly and I have no complaint about its behavior in this case.
> My reference to Postel's law was intended to point out that libgroff being
coded to liberally accept "grout" from other troffs (generically, "rout"?)
doesn't necessarily indicate an intention on groff's part to support
_producing_ such grout.
I think that begins to wander afield from the point of this bug. While I
don't take Postel's Law as the categorical imperative that some do, and in
fact my inclinations lean against it, the point at issue here is a divergence
of _troff_ output stream interpretation between output drivers that _groff_
*provides*. This is not a question of dealing with Joe's Fly-by-Night
Half-Assed Formatter.
I think following Postel in this situation is sound advice because we have all
learned from the many people aggrieved by Stuart Feldman's make(1) that
applying syntactical important syntactical distinctions to different kinds of
white space is a hazardous practice, _particularly_ for input formats that do
not closely resemble discursive human communication.
> > and if someone were to convert PostScript generated by _grops_
> > to PDF and PDF generated by _gropdf_ to PostScript, one should
> > not be able to tell which output driver originally produced the
> > document (ignoring comments within the files announcing the fact).
>
> I agree with that, but I'm not sure I follow your logic, as the above seems
to be about the _output_ of the two postprocessors, whereas the API question
is about their _input_.
I hope that I have illuminated the subject adequately.
> Anyway, this debate is academic: ultimately, I favor making grout more
usable by other tools (and more readable to humans), and whitespace is
important for that, regardless of whether one characterizes it as an API
change.
I appreciate that, but I am alarmed at the charge of an "API change" because a
code alteration receiving that label is generally regarded as highly
disruptive and demanding of a far higher level of scrutiny.
I hope that the use cases illustrated above (and in the tar attachment from
comment #19, as it happens) constitute decisive evidence against such a claim.
(file #54894)
_______________________________________________________
Additional Item Attachment:
File name: CSTR_54_1992_section_22.png Size:263 KB
<https://file.savannah.gnu.org/file/CSTR_54_1992_section_22.png?file_id=54894>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?64360>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, (continued)
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, Deri James, 2023/06/27
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, G. Branden Robinson, 2023/06/27
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, Deri James, 2023/06/27
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, G. Branden Robinson, 2023/06/27
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, Dave, 2023/06/27
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, G. Branden Robinson, 2023/06/28
- [bug #64360] [gropdf] does not correctly handle white space after 'w' command, G. Branden Robinson, 2023/06/28
- [bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command, G. Branden Robinson, 2023/06/28
- [bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command, Dave, 2023/06/28
- [bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command, Dave, 2023/06/28
- [bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command,
G. Branden Robinson <=
- [bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command, Deri James, 2023/06/28
- [bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command, Deri James, 2023/06/28
- [bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command, G. Branden Robinson, 2023/06/29