groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 1.23: UTF-8 device produces mysterious characters


From: Steffen Nurpmeso
Subject: Re: 1.23: UTF-8 device produces mysterious characters
Date: Mon, 12 Sep 2022 23:41:34 +0200
User-agent: s-nail v14.9.24-297-g9844dfc386

Hello Branden.

G. Branden Robinson wrote in
 <20220912144641.q2r65kkfpiej4u2u@illithid>:
 |At 2022-09-12T15:43:00+0200, Steffen Nurpmeso wrote:
 |> I have problems with the UTF-8 device, it shows
 |> 
 |>   on‐main‐loop‐tick
 |> instead of
 |>   on-main-loop-tock
 |> 
 |> ie U+2010 instead of hyphen-minus U+002D.
 |> 
 |> The above does not feel right, and searching is impossible!
 |> I would expect U+2010 HYPHEN in hyphenation, but not as a regular
 |> combiner aka delimiter joined words as are used very often in
 |> German, for example.
 |
 |There are a few points to raise about this.  The first is a question.
 |
 |1.  You don't expect a hyphenated word to use a hyphen?

This is not a hyphenated word.  In Germany, not only due to
feminist aka suffragette movement, we have lots of names like
that even.  For example Annette von Droste-Hülshoff, 12. Januar
1797 until 24. Mai 1848.  You could hyphenate that, but then, at
some point, feminism comes to an end!  (For her it has different
roots, though.)

 |2.  This is not a "1.23"-specific issue as your subject lines suggests.
 |
 |$ groff --version | head -n 1
 |GNU groff version 1.22.4

Ok.. this i did not know.  Until last week i was solely using
1.22.3, even if the system has 1.22.4 (just not for me).

  ...
 |3.  If you're secretly in a man page context but didn't disclose that,
 |    then, yes, this is a change from groff 1.22.4.  The hyphen-minus,
 |    neutral apostrophe, and grave accent no longer map differently for
 |    man(7) and mdoc(7) than for any other macro package.  (\- still does

Oh.  While cycling dimly recalled there was a discussion here, but
did not truly follow it(?).

 |    and there is no prospect of that changing, since there is no *roff
 |    special character defined for the "ASCII hyphen-minus", and it is
 |    essential to express this precise character in man pages.  These
 |    issues have been discussed at some length on this mailing list over
 |    the past three years.)

Really.  The above is just wrong, Branden.  Who said such?
You cannot use HYPHEN for the above.  Hyphen-minus itself,
less-than, greater-than, no-break space, LEFT-POINTING DOUBLE
ANGLE QUOTATION MARK, only to go until 0xAB.  Or standard names
like IEEE Std 1003.1™-2017, IEEE Std 1003.1-2008, C-language,
code-level, POSIX.1-2017, built-in, this is only the first page of
that standard.  Or the ISO C17 standard, you search for "-" in the
official PDF, and you find it for Storage-class, absolute-value,
floating-point, type-generic, thread-specific, and more, and we
are still in the TOC.  No no -- no HYPHEN here!

These are _not_ hyphenated words.

If roff can make a difference in true hyphenation points (i had to
take a loooong look), then it could change a hyphen-minus on the
input side with a hyphen on the output side when it really breaks
a line at that point.  Otherwise hyphen-minus is the only viable
alternative.

Or look at the Unicode standard, where real great minds with
incredible multi-national professional life careers are involved,
get the official PDF (hr-hrm, i have not updated since Unicode
13..), combined words are separated with hyphen-minus, _not_
hyphen.

This is really wrong.

 |4. "on-main-loop-tick" doesn't look a natural language word to me--it
 |   looks like an identifier in a programming language (maybe some
 |   dialect of Lisp).  If that is the case, those hyphens need to be
 |   spelled "\-" in the source code.  This has always been true in man

Well, yes and no.  Hyphen is just everywhere in 1.23.

 |   pages, going back to 1979.
 |
 |   Take
 |     $ grep '\\-[A-Za-z]' ~/src/unix/v7/usr/man/man1/bc.1
 |.B \-c

Yeees, well, i really had to look you know.  This is a language
and there was development and it was a lot of woolding.

  -.th MAIL I 10/25/72
  -.sh NAME
  -mail  \*-  send mail to another user

Who says it is not an evolution of the above?
Doug McIlroy is on this list, maybe he reads and knows.
Though he said something about the NATO today, and that lying
aggressive Endsieg beast is definetely on the other side of the
road.

And by the way, you mention flags in the above.  Flags are
different, because often you want this to be a U+2013 EN DASH.
Ie, you want to make it _longer_ than a hyphen-minus.  Not super
short like a hyphen.  Imho.

  ...
 |5.  Searching is not impossible.
 |    5a. Searching for a word that is broken and hyphenated across lines
 |        is no more impossible than it always was.  On occasions when I
 |        have to do this, I break out sed(1) or perl(1).

It is not hyphenated, Branden.

 |    5b. Literals that might be of interest in man pages should be
 |        entered with hyphenation suppressed in the input.  The groff man

Hey!  This is not rocket science or something.
I am happy if people at least do _write_ manuals _at_all_.

 |        pages in 1.23 do this much more conscientiously than in past
 |        releases.  This is to avoid confusing users who might wonder if
 |        a hyphen is to be interpreted literally or not.
 |
 |    5c. You can disable automatic hyphenation altogether when rendering
 |        man pages.  See the '-rHY' option in groff_man(7).  This feature
 |        has been around for many years.
 |
 |    5d. groff's mdoc(7) implementation did not recognize the `HY`
 |        register in groff 1.22.4 and earlier.  It does now, though.
 |
 |    5e. For me, anyway, searching within less(1) using the pattern with
 |        a dot where the hyphen goes works fine, even though there are 3
 |        bytes in the input stream instead of one.  Evidently less(1) is

Fuzzy-search code-wise? ;)

 |        smart enough.  For instance, I can match "line-ending" in the
 |        roff(7) page while paging it with "groff -Tutf8 -man | less -R"
 |        by entering "/line.ending" within less(1).
 |
 |I hope this clears some things up.

Certainly not for me.  Hyphen is good at the end of line when
a word is hyphenated, otherwise it is misplaced.
And using hyphen to combine words is wrong.  En dash would look
nice, i could imagine.

Ciao,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]