[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposed: stop subjecting right-hand sides of `char` family requests
From: |
Douglas McIlroy |
Subject: |
Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation |
Date: |
Sun, 2 Apr 2023 13:40:41 -0400 |
My mistake. I found the word "entity" in groff.7, not groff_char.7.
Nevertheless, thanks for the revised History section of groff_char.7.
It's a far more definitive account than I could give from my own
memory.
Doug
On Sat, Apr 1, 2023 at 9:22 PM G. Branden Robinson
<g.branden.robinson@gmail.com> wrote:
>
> Hi Doug,
>
> At 2023-04-01T19:45:19-0400, Douglas McIlroy wrote:
> > I went to see what this proposal meant and ran into undefined jargon
> > in groff_char.7.
>
> This, and phrases like "in the actual version", are regrettable defects
> in the groff 1.22.4 version of this man page.
>
> The one in the groff 1.23.0.rc2 and .rc3 release candidates does not
> have them. This page is one that I've heavily revised. I'm attaching a
> copy for your consideration. I'd particularly welcome your comments on
> the new "History" section.
>
> > Yes, info groff probably tells me more than I want to know. Still, I
> > expect the man page to be terse, but intelligible.
>
> Fair. I hope the intelligibility of the present form is improved.
>
> > What's an "entity"?
>
> Suggestive of conceptual fuzziness on the part of the writer, I would
> propose. But I can't blame them; the difficulty of comprehending
> groff's flexible and complex character to glyph transformation process
> is the main reason I have not yet revised that part of our Texinfo
> manual.
>
> > Fortunately, Dave Kemper's post shed light on this question.
> >
> > The first use of .char that came to mind was
> > .char \[ntilde] \o'n~'
> > which would collide badly with the following ancient trick for
> > unbreakable, unpaddable space. (Ignore the question of whether the
> > tilde at hand is usable as a diacritical.)
> > .tr ~
> > a~b~c
>
> You may be one of a dwindling number of people for whom that ancient
> trick comes to mind. :) But we do continue to support it, and I see no
> reason to withdraw it.
>
> > This, I guess, is typical of the motivation for the change.
>
> I was spurred into this by noticing a problem last July with what I
> think was a historical troff document. I can't lay my hands it now, but
> the following short example suggests the issue.
>
> $ cat EXPERIMENTS/tr-in-env.roff
> .nf
> .tr ab
> bab
> .ev 1
> bab
> .br
> .ev
> bab
> .pl \n(nlu
>
> This produces 3 lines of "bbb".
>
> The problem I observed, as best I can recall, was that a document
> temporarily used `tr` to make input more convenient.
>
> The trouble was, the same character they were translating turned up in
> one of their page headers or footers.
>
> So, depending on how the document got modified and the resulting
> placement of the `tr`-ed material, the headers/footers might get
> corrupted or might not.
>
> A lengthier, but contrived, example of this is at
> <https://savannah.gnu.org/bugs/?62691>.
>
> I suppose there are workarounds one could coach the user to undertake in
> such a situation, but once I got to thinking about it, it struck me that
> there should be a cleaner division of responsibility between `tr` and
> `char`.
>
> My suggestion is twofold: (1) that `tr` should be used for permuting
> what we can term groff's internal character set; meaning the 94
> printable characters of ASCII/Basic Latin, and whatever special
> characters happen to be defined; and (2) `char` and `rchar` are for
> adding and removing members of the set of special characters. (You can
> try to `rchar` an ordinary Basic Latin character; it will silently fail.
> I mean to make that no longer silent.[1])
>
> It is necessary to consider the impact of these processes on diversions.
> I don't presently think my proposal is disruptive to the status quo in
> that respect. When a diversion is populated, special character
> definitions are already resolved, and just as with string
> interpolations, using the `unformat` request does not recover their
> original forms.
>
> Illustration (with groff 1.22.4):
>
> $ cat EXPERIMENTS/char-in-a-diversion.groff
> .nf
> .char \[zz] FNORD
> .di XX
> You didn't \[zz] this.
> .di
> Hello, world.
> diverted XX: \c
> .XX
> .unformat XX
> unformatted XX: \*[XX]
> .pl \n[nl]u
> $ nroff -Tascii EXPERIMENTS/char-in-a-diversion.groff
> Hello, world.
> diverted XX: You didn't FNORD this.
> unformatted XX: You didn't FNORD this.
>
> $
>
> > Suppose the change isn't made? What does .char do for you that .ds
> > doesn't? Certainly nothing essential in the example above. However, it
> > can avoid the ugliness of string invocations.
>
> I don't remember where I saw this trick, but you can use a
> `char`-defined object as a margin character, and I suppose just about
> anywhere else the language syntax is accepting of an atomic character.
> The utility of this comes in when realizing that someone might
> reasonably want to set a margin character in a particular typeface
> (maybe it's a dingbat--most of these don't have special character names)
> and/or in a certain color.
>
> Recasting the language of the 1.22.4 Texinfo manual, `char` is described
> as doing this to the RHS of its definition: "[the RHS] is processed in a
> temporary environment and the result is wrapped up into a single object.
> Compatibility mode is turned off and the escape character is set to '\'
> while [it] is being processed. Any emboldening, constant spacing or
> track kerning is applied to this object rather than to individual
> characters in [it]."
>
> > I regard the potential benefit mentioned in the last sentence as
> > unpersuasive, but the potential catastrophe of the initial example as
> > tilting the scales toward the proposal.
>
> I think it would help distinguish and orthogonalize the language if
> `char` character definitions remained global to formatter state, and
> translations/transliterations with `tr` became properties of the
> environment.
>
> I suppose roff veterans are used to it, but my mind twists even when
> looking at my own example in Savannah #62691 (linked above).
>
> Namely,
>
> .tr @--@
>
> is not a no-op! In fact, it works a lot like file descriptor
> redirections in the shell.
>
> foo >/dev/null 2>&1 | grep error
>
> Each left-hand member of a `tr` translation pair identifies a place in
> the translation "from" space, and each right-hand member a place in the
> "to" space. The transform is then done atomically. On occasions when I
> want to send throw standard output away but grep the standard error
> stream, I haltingly think through this same issue.
>
> Regards,
> Branden
>
> [1] https://savannah.gnu.org/bugs/?63985
- Proposed: stop subjecting right-hand sides of `char` family requests to character translation, Douglas McIlroy, 2023/04/01
- Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation, G. Branden Robinson, 2023/04/01
- Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation, Ralph Corderoy, 2023/04/02
- Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation,
Douglas McIlroy <=
- Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation, Dave Kemper, 2023/04/10
- user-defined characters, translation maps, and environment binding (was: Proposed: stop subjecting right-hand sides of `char` family requests), G. Branden Robinson, 2023/04/24
- Re: user-defined characters, translation maps, and environment binding, Ralph Corderoy, 2023/04/24
- Re: user-defined characters, translation maps, and environment binding, G. Branden Robinson, 2023/04/24
- Re: user-defined characters, translation maps, and environment binding, G. Branden Robinson, 2023/04/24