[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed: stop subjecting right-hand sides of `char` family requests

From: G. Branden Robinson
Subject: Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation
Date: Sat, 1 Apr 2023 20:22:23 -0500

Hi Doug,

At 2023-04-01T19:45:19-0400, Douglas McIlroy wrote:
> I went to see what this proposal meant and ran into undefined jargon
> in groff_char.7.

This, and phrases like "in the actual version", are regrettable defects
in the groff 1.22.4 version of this man page.

The one in the groff 1.23.0.rc2 and .rc3 release candidates does not
have them.  This page is one that I've heavily revised.  I'm attaching a
copy for your consideration.  I'd particularly welcome your comments on
the new "History" section.

> Yes, info groff probably tells me more than I want to know. Still, I
> expect the man page to be terse, but intelligible.

Fair.  I hope the intelligibility of the present form is improved.

> What's an "entity"?

Suggestive of conceptual fuzziness on the part of the writer, I would
propose.  But I can't blame them; the difficulty of comprehending
groff's flexible and complex character to glyph transformation process
is the main reason I have not yet revised that part of our Texinfo

> Fortunately, Dave Kemper's post shed light on this question.
> The first use of .char that came to mind was
>         .char \[ntilde] \o'n~'
> which would collide badly with the following ancient trick for
> unbreakable, unpaddable space. (Ignore the question of whether the
> tilde at hand is usable as a diacritical.)
>         .tr ~
>         a~b~c

You may be one of a dwindling number of people for whom that ancient
trick comes to mind.  :)  But we do continue to support it, and I see no
reason to withdraw it.

> This, I guess, is typical of the motivation for the change.

I was spurred into this by noticing a problem last July with what I
think was a historical troff document.  I can't lay my hands it now, but
the following short example suggests the issue.

$ cat EXPERIMENTS/tr-in-env.roff
.tr ab
.ev 1
.pl \n(nlu

This produces 3 lines of "bbb".

The problem I observed, as best I can recall, was that a document
temporarily used `tr` to make input more convenient.

The trouble was, the same character they were translating turned up in
one of their page headers or footers.

So, depending on how the document got modified and the resulting
placement of the `tr`-ed material, the headers/footers might get
corrupted or might not.

A lengthier, but contrived, example of this is at

I suppose there are workarounds one could coach the user to undertake in
such a situation, but once I got to thinking about it, it struck me that
there should be a cleaner division of responsibility between `tr` and

My suggestion is twofold: (1) that `tr` should be used for permuting
what we can term groff's internal character set; meaning the 94
printable characters of ASCII/Basic Latin, and whatever special
characters happen to be defined; and (2) `char` and `rchar` are for
adding and removing members of the set of special characters.  (You can
try to `rchar` an ordinary Basic Latin character; it will silently fail.
I mean to make that no longer silent.[1])

It is necessary to consider the impact of these processes on diversions.
I don't presently think my proposal is disruptive to the status quo in
that respect.  When a diversion is populated, special character
definitions are already resolved, and just as with string
interpolations, using the `unformat` request does not recover their
original forms.

Illustration (with groff 1.22.4):

$ cat EXPERIMENTS/char-in-a-diversion.groff
.char \[zz] FNORD
.di XX
You didn't \[zz] this.
Hello, world.
diverted XX: \c
.unformat XX
unformatted XX: \*[XX]
.pl \n[nl]u
$ nroff -Tascii EXPERIMENTS/char-in-a-diversion.groff
Hello, world.
diverted XX: You didn't FNORD this.
unformatted XX: You didn't FNORD this.


> Suppose the change isn't made? What does .char do for you that .ds
> doesn't? Certainly nothing essential in the example above. However, it
> can avoid the ugliness of string invocations.

I don't remember where I saw this trick, but you can use a
`char`-defined object as a margin character, and I suppose just about
anywhere else the language syntax is accepting of an atomic character.
The utility of this comes in when realizing that someone might
reasonably want to set a margin character in a particular typeface
(maybe it's a dingbat--most of these don't have special character names)
and/or in a certain color.

Recasting the language of the 1.22.4 Texinfo manual, `char` is described
as doing this to the RHS of its definition: "[the RHS] is processed in a
temporary environment and the result is wrapped up into a single object.
Compatibility mode is turned off and the escape character is set to '\'
while [it] is being processed.  Any emboldening, constant spacing or
track kerning is applied to this object rather than to individual
characters in [it]."

> I regard the potential benefit mentioned in the last sentence as
> unpersuasive, but the potential catastrophe of the initial example as
> tilting the scales toward the proposal.

I think it would help distinguish and orthogonalize the language if
`char` character definitions remained global to formatter state, and
translations/transliterations with `tr` became properties of the

I suppose roff veterans are used to it, but my mind twists even when
looking at my own example in Savannah #62691 (linked above).


.tr @--@

is not a no-op!  In fact, it works a lot like file descriptor
redirections in the shell.

foo >/dev/null 2>&1 | grep error

Each left-hand member of a `tr` translation pair identifies a place in
the translation "from" space, and each right-hand member a place in the
"to" space.  The transform is then done atomically.  On occasions when I
want to send throw standard output away but grep the standard error
stream, I haltingly think through this same issue.



Attachment: groff_char.7
Description: Text document

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]