groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] pdfmom grep (was parallel text processing)


From: Steffen Nurpmeso
Subject: Re: [Groff] pdfmom grep (was parallel text processing)
Date: Fri, 08 Sep 2017 22:37:58 +0200
User-agent: s-nail v14.9.3-64-gad47883e

i wrote:
 |Peter Schaffter <address@hidden> wrote:
 ||On Fri, Sep 08, 2017, Ralph Corderoy wrote:
 ||>> You'll notice that the top of the pdf file has a line of text spit out
 ||>> by grep(1) that obviously shouldn't be there.
 ...
 ||Problem solved.
 ...
 ||The solution is to pass the -a flag to grep.

This flag is not standardized, though i failed to find a system
that does not have it with a shallow glance (*BSD, Linux).

  ...
 ||Question: why does grep treat the presence of the diacritic as cause
 ||for saying "Binary file (standard input) matches"?
 |
 |Likely because that is true in your locale?  It is very likely
 |that this cannot work (i see -k could possibly happen), suppose
 |you are in a LATIN1 locale and process UTF-8, and it is even worse
 |when your own locale is more picky than LATIN1.  Strives me this
 |should be split up so that perl itself performs the grep, in
 |charset-agnostic mode.  Even very large documents should generate
 |no limit here, otherwise there is no problem to create the two
 |pipelines concurrently ...

Yes, but a very simple implementation is appended that simply
converts the thing into a three-step approach.  This requires one
temporary file.  I am also a bit rusty regarding perl, yet
mom-pdf.mom and camus.mom both work out fine.  (The thing is that
groff is supposed to work on Windows, and as far as i know they
cannot really fork(2), thus i refrained from spending time on
doing something that avoids the temporary file!??  Maybe good
enough for a draft on late Friday evening.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]