[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] pdfmom grep (was parallel text processing)

From: Steffen Nurpmeso
Subject: Re: [Groff] pdfmom grep (was parallel text processing)
Date: Fri, 08 Sep 2017 19:46:36 +0200
User-agent: s-nail v14.9.3-64-gad47883e

Peter Schaffter <address@hidden> wrote:
 |On Fri, Sep 08, 2017, Ralph Corderoy wrote:
 |>> You'll notice that the top of the pdf file has a line of text spit out
 |>> by grep(1) that obviously shouldn't be there.
 |> I couldn't come up with the groff 1.22.3-7 command line required to
 |> build the PDF correctly, nor get grep's unwanted output.  Deri suggested
 |> pdfmom's grep might be the culprit, but its stderr should end up on
 |> pdfmom's stderr?
 |Problem solved.
 |The superfluous line at the top of the file ["Binary file (standard
 |input) matches"] isn't stderr, it's stdout, so it becomes part of
 |the pipeline.  The grep in pdfmom is returning a binary file hit when
 |it encounters the diacritic in 
 |  .ds pdf:look(pdf:bm1) L'├ętranger
 |Since the binary file hit doesn't begin with .ds, it prints literally
 |at the top of the file.
 |The solution is to pass the -a flag to grep.
 |Deri: do you want me to fix this in pdfmom and push the change, or
 |would you prefer to do it yourself?
 |Question: why does grep treat the presence of the diacritic as cause
 |for saying "Binary file (standard input) matches"?

Likely because that is true in your locale?  It is very likely
that this cannot work (i see -k could possibly happen), suppose
you are in a LATIN1 locale and process UTF-8, and it is even worse
when your own locale is more picky than LATIN1.  Strives me this
should be split up so that perl itself performs the grep, in
charset-agnostic mode.  Even very large documents should generate
no limit here, otherwise there is no problem to create the two
pipelines concurrently ...

|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]