groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: *roff hyphenation trivia challenge


From: G. Branden Robinson
Subject: Re: *roff hyphenation trivia challenge
Date: Tue, 2 Apr 2024 13:29:05 -0500

Hi Steve,

At 2024-04-02T13:42:59-0400, Steve Izma wrote:
> On Tue, Apr 02, 2024 at 06:51:51PM +0200, Tadziu Hoffmann wrote:
> > Subject: Re: *roff hyphenation trivia challenge
> 
> > For "antidisestablishmen\%tarianism", groff prints
> > 
> >   antidisestablishmen-
> >   tar-
> >   i-
> >   an-
> >   ism
> > 
> > (which I think is strange), while TeX and Heirloom troff print
> > 
> >   antidisestablishmen-
> >   tarianism
> > 
> > which I think is the only reasonable way of handling this case.
> 
> I disagree.

Oops. I misread Tadziu's example, and hallucinated a leading `\%` in it.

If there is no _leading_ `\%`, then infixed `\%` escape sequences can
only add hyphenation points; they cannot remove them.  AIUI.

> I prefer groff's behaviour because I don't ever want correct
> hyphenation points to be ignored.

...unless you use a leading `\%` on the word, I assume.

> Also for \% at the beginning of a word, I rarely use this.

I use it frequently in man(7) documents, because the `hw` request is not
portable/reliable (in theory).  Also there's no mechanism for removing
these, so if we tolerate/encourage their use, doing so deals a blow to
reliable/predictable batch rendering.[1]

> If I don't want a word hyphenated at all, then it's likely that I
> don't want it hyphenated anywhere in the document. And in such cases I
> would add
> 
> .hw antidisestablishmentarianism
> 
> to the document once (or, preferably, to a local tmac file used
> for the project).

Right.

> This may not be important for man page authors, but it's very
> important in a production environment.

So let me amend my claim.

I think it's weird that

> > [f]or "antidisestablishmen\%tarianism", groff prints
> > 
> >   antidisestablishmen-
> >   tar-
> >   i-
> >   an-
> >   ism

whereas

$ printf '.ll 1n\nantidisestablishment\n' | nroff -Wbreak | cat -s
an‐
tidis‐
es‐
tab‐
lish‐
ment

seems like well-behaved formatting to me.

...except for the lack of a break point after "ti", of course.  But I'm
comfortable assuming that the discrepancy here is a limitation of the
TeX hyphenation system aggravated by English's polyglot morphology.

Is TeX's hyphenation algorithm defeated by the pathological case of
"antidisestablishmentarianism", and groff's implementation of it
"recovers" differently?

Regards,
Branden

[1] ...because a man(7) document from one source can declare a
    hyphenation exception that then applies to "remote" documents
    formatted subsequently.

    https://savannah.gnu.org/bugs/index.php?64478

    I've contemplated adding a `rhw` request that removes hyphenation
    exceptions.  Call it without arguments to remove them all.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]