[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: man(7), hyphen, and minus
From: |
G. Branden Robinson |
Subject: |
Re: man(7), hyphen, and minus |
Date: |
Tue, 27 Dec 2022 04:25:41 -0600 |
At 2022-12-24T14:43:44-0800, Russ Allbery wrote:
> I probably should have assumed. One of the things that I've noticed
> over and over about free software is that nothing new ever truly
> replaces something old in a comprehensive sense. I can think of very
> few programs that truly no one is using any more, because once the
> source code is available to keep them alive, someone will keep them
> alive. It makes for a rather interesting diversity of software (and
> other things; for instance, I still use Usenet).
I'd happily get back on USENET if someone has solved the spam problem.
I'm old enough to remember those green-card hawking lawyers who were the
harbingers of death.
> Oh, so I was going to mention: currently, Pod::Man rolls its own
> macros for verbatim text:
>
> .de Vb \" Begin verbatim text
> .ft CW
> .nf
> .ne \\$1
> ..
> .de Ve \" End verbatim text
> .ft R
> .fi
> ..
>
> This looks basically equivalent to .EX/.EE,
Yup. Except for the detail of the name of the constant-width font,
which is not consistently defined across implementations or even output
devices within an implementation (as already discussed).
groff's tmac/an-ext.tmac says these days:
.\" Define this to your implementation's constant-width typeface.
.ds mC CW
.if n .ds mC R
> so I thought about using those macros (and defining my own if they're
> not available, at least until no one is using older implementations
> that don't have them). But the main thing that .EX doesn't support
> that the long-standing Pod::Man behavior does is the .ne invocation,
> which is used like this:
>
> # Get a count of the number of lines before the first blank line, which
> # we'll pass to .Vb as its parameter. This tells *roff to keep that many
> # lines together. We don't want to tell *roff to keep huge blocks
> # together.
> my @lines = split (m{ \n }xms, $text);
> my $unbroken = 0;
> for my $line (@lines) {
> last if $line =~ m{ \A \s* \z }xms;
> $unbroken++;
> }
> if ($unbroken > 12) {
> $unbroken = 10;
> }
>
> This logic is very long-standing and was designed for troff printing of a
> manual page (and older nroff setups that still did pagination) to avoid
> unnecessary page breaks in the middle of a verbatim block. I'm not sure
> how much this matters given how people use man pages these days, but I
> hate to break it for no reason.
You've managed to wangle a display, and once people get that religion
they're loath to give it up. Despite my commitment to a limited man(7)
dialect I have proven unable to stop myself from adding `ne` requests to
groff's own man pages to keep our PDF compilation from looking ugly.
> So I think I'd need to add an .ne line after (before?) the .EE macro
> if I switched to it?
Well, you can throw away that line counting logic in Perl altogether and
simply use `ne` _before_ EX (not EE).
Another point of detail is that you should break with `br` _before_ the
`ne` request. `ne` won't always do what you want if there is a pending
output line.
I have plans to add keep macros `KS`/`KE` to groff man(7) in the near
future; they are probably the least controversial extensions I can
possibly add because it will always be okay for an implementation to
totally ignore them. No text will be lost or misformatted; page breaks
will just happen in dumb places, and for the overwhelming majority of
terminal users who experience the continuous rendering default, even
that won't apply.
> Okay, fair. :) Although historically people sometimes did, and of
> course once upon a time people would sometimes typeset the full manual
> for something with troff.
They still do. Alex Colomar, the new linux-man maintainer, is shy of
learning ms(7) or any other macro package.
If a "full manual" doesn't need features that man(7) doesn't provide, I
see no real harm in using it for non-man-page documents. Colin Watson's
"-l" extension to man(1) has made this extremely straightforward.
> That output probably isn't as nice as it used to, since I have
> subsequently dropped a lot of the attempted magic that only applied to
> troff output (replacing paired " quotes with `` '', adding small caps
> to long strings of all capital letters, and things like that) because
> they were all using scary regexes and occasionally broke things and
> mangled things in weird ways, causing lots of maintenance issues.
Yes, and there are concerns I would raise with both of those helpful
bits of automagic anyway.
> > Yes. But there are two problems to solve: (1) acceptance of Unicode
> > (probably just UTF-8) input
>
> I was pleasantly surprised at how well this just worked with the
> man-db setup on a Debian system, although I think that may involve a
> fair amount of preprocessing.
Mainly just running preconv(1), I think, which groff has supplied since
1.20, so for about 14 years I guess.
> Just to provide additional detail for the record (and this is almost
> certainly the sort of thing you mean by "acceptance of Unicode input")
> here's the simple document I was using for some testing.
>
> https://raw.githubusercontent.com/rra/podlators/main/t/data/man/encoding.utf8
>
> % groff -man -Tpdf -k encoding.utf8 > encoding.pdf
> troff: encoding.utf8:72: warning: can't find special character 'u0308'
> troff: encoding.utf8:74: warning: can't find special character 'u1F600'
>
> u1F600 is presumably a problem with the output font,
Yes. Try that to the terminal (-Tutf8) and it should work.
> but u0308 is a combining accent mark that groff does definitely
> support, just not as a separate character.
Right. It's \[ad].
> (Without preconv, one instead gets mojibake, as I expected.)
I got warnings, too (using -ww):
troff:EXPERIMENTS/encoding.utf8:72: warning: invalid input character code 136
troff:EXPERIMENTS/encoding.utf8:74: warning: invalid input character code 159
troff:EXPERIMENTS/encoding.utf8:74: warning: invalid input character code 152
troff:EXPERIMENTS/encoding.utf8:74: warning: invalid input character code 128
There is a whole universe of validity problems to cope with even if we
had support for direct input of valid UTF-8. :(
> My theory was that combining accent marks pose a bit of an interesting
> issue for groff because groff probably shouldn't think of them as a
> separate output character that can be mapped in an output font, but
> instead needs to essentially transform them into something like
> \[u0069_0308] during the input processing. (This may therefore
> essentially be a preconv bug as opposed to a troff bug, and maybe
> nroff gets away with it because it can just copy combining accent
> marks to the output device and let xterm take care of rendering.)
I don't actually know if xterm performs combinations like this or it
expects precomposed characters.
The groff_char(7) man page from groff Git covers some of this stuff in
increased detail, such as `composite` request and the Normalization Form
D requirement. But the discussion still may not be complete, as I
haven't tried to solve the Unicode input problem myself. Fortunately we
have a patch pending for CJK/UTF-16 font support which promises to give
me an excuse to widen groff's internal character type.
Here's hoping I haven't worn out the submitter's patience while I tried
to get 1.23.0 ready...
> It all makes sense when viewed through the lens of the *roff language,
> but of course in the Unicode world one expects to be able to just
> produce a stream of code points and have everything cope.
Yes..."just coping" is achieved with a massive pile of standards
documents that augment the ISO 10646 character encoding. :D
> I am sad that currently Pod::Man is one of the impediments to good
> rendering of manual pages in other formats, since I make use of more
> of the *roff language (mostly to work around bugs) than those tools
> often understand. So I have an incentive to want to simplify the
> output as much as I can, consistent with remaining portable.
Consider me a resource for this effort.
Regards,
Branden
signature.asc
Description: PGP signature
- Re: man(7), hyphen, and minus, (continued)
- Re: man(7), hyphen, and minus, Russ Allbery, 2022/12/23
- Re: man(7), hyphen, and minus, G. Branden Robinson, 2022/12/24
- Re: man(7), hyphen, and minus, Russ Allbery, 2022/12/24
Re: man(7), hyphen, and minus, Russ Allbery, 2022/12/23
- Re: man(7), hyphen, and minus, Richard Morse, 2022/12/23
- Re: man(7), hyphen, and minus, G. Branden Robinson, 2022/12/24
- Re: man(7), hyphen, and minus, Russ Allbery, 2022/12/24
- Re: man(7), hyphen, and minus, Nate Bargmann, 2022/12/24
- Re: man(7), hyphen, and minus,
G. Branden Robinson <=
- Re: man(7), hyphen, and minus, Ralph Corderoy, 2022/12/27
- Re: man(7), hyphen, and minus, Ingo Schwarze, 2022/12/29
Re: man(7), hyphen, and minus, Oliver Corff, 2022/12/24