groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] pdfmom grep (was parallel text processing)


From: Ralph Corderoy
Subject: Re: [Groff] pdfmom grep (was parallel text processing)
Date: Sun, 10 Sep 2017 11:12:21 +0100

Hi Peter,

> The pipeline in the current pdfmom is actually
>
>   groff -Tpdf -dLABEL.REFS=1 -mom -z $preconv $cmdstring 2>&1 | 
>   grep '^\\. *ds' |
>   groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - $preconv $cmdstring 
> 2>&1 |
>   grep '^\\. *ds' |
>   groff -Tpdf -mom $preconv - $cmdstring
...
> ***pdfmom pipeline entered literally at the command line
>   groff -Tpdf -dLABEL.REFS=1 -mom -z -k camus.mom 2>&1 | \
>   grep '^\.  *ds' | \
>   groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z -k - camus.mom 2>&1 | \
>   grep '^\. *ds' | \
>   groff -Tpdf -mom -k - camus.mom > camus.pdf
> - grep does not report a binary file hit

The middle groff's `-k -' are swapped, but I don't think that affects
anything.  (BTW, the backslashes aren't needed after a pipe;  by design,
that indicates the line continues.)

> ***pdfmom itself at the command line
>   pdfmom -k camus.mom > camus.pdf
> - grep reports a binary file hit
>
> strace on 'pdfmom -k camus.mom > camus. pdf' produces

I've neatened this up a bit, and show non-zero exits, and a SIGPIPE.

>             pdfmom -k camus.mom
>             sh -c groff -Tpdf -dLABEL.REFS=1 -mom ...
>             groff -Tpdf -dLABEL.REFS=1 -mom -z -k camus.mom
>     exit(1) grep ^\\. *ds
>             groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - -k camus.mom
>             grep ^\\. *ds
>             groff -Tpdf -mom -k - camus.mom
>             preconv - camus.mom
>     PIPE    troff -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z -Tpdf
>             preconv camus.mom
>             troff -dLABEL.REFS=1 -mom -z -Tpdf
>             troff -mom -Tpdf
>             preconv - camus.mom
>             gropdf

Let's go through it a step at a time to see if I can get across the
problem.  Back to pdfmom...

>   groff -Tpdf -dLABEL.REFS=1 -mom -z $preconv $cmdstring 2>&1 | 
>   grep '^\\. *ds' |
>   groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - $preconv $cmdstring 
> 2>&1 |
>   grep '^\\. *ds' |
>   groff -Tpdf -mom $preconv - $cmdstring

I run that manually.

    $ preconv=-k
    $ cmdstring=camus.mom
    $ groff -Tpdf -dLABEL.REFS=1 -mom -z $preconv $cmdstring 2>&1
    camus.mom:18: can't translate character code 233 to special
        character `'e' in transparent throughput
    $

There's no /^\.ds/ in that output, explaining why the first grep in
strace's output exit'd 1, so the stdin to the second groff is empty and
it's as if the first groff and grep didn't exist in this case.  Onto the
second groff.

    $ groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - $preconv $cmdstring 
2>&1
    ^D^D^D^D
    .ds pdf:look(pdf:bm1) L'�tranger
    camus.mom:18: can't translate character code 233 to special
        character `'e' in transparent throughput
    $

(Bizarre I had to type the TTY's eof four times before groff stopped
trying to read.)

Here's the problem.  Whatever is producing that `.ds' line is writing
ISO 8859-1, and my UTF-8 terminal rightly replaces it with `�', U+FFFD.
We're being told there was a problem too, in both this groff and the
previous one, with the `can't translate' warning.  Decimal 233 is U+E9
that's the `é' in

    $ grep TITLE camus.mom
    .TITLE      "L'étranger

This non-UTF-8 is fed into the second grep for /^\.ds/.  It's in your
UTF-8 locale and correctly says standard input, containing binary,
matches rather than passing on the `.ds' line.

To investigate why it doesn't occur when you run the pipeline manually,
insert tee(1)s to snaffle the mid-pipeline data, or simply start with
the first command and tack on one more command on each subsequent run.
Have you a ~/bin/grep that alters the locale?

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]