bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #64061] pdfpic.tmac requires non-standard sed feature


From: G. Branden Robinson
Subject: [bug #64061] pdfpic.tmac requires non-standard sed feature
Date: Sat, 6 May 2023 17:45:28 -0400 (EDT)

Follow-up Comment #17, bug #64061 (project groff):

Hi Deri,

> The switch from using grep to sed, which seems to have caused issues,

Not exactly; it's more like it caused us to pay a lot more attention to
something that was a bit jinky in the first place.

Here's what the code looked like when Bernd added it in 2015 (after groff
1.22.3).


+.\" get image dimensions
+.  ec @
+.  sy pdfinfo @$1 | \
+grep "Page *size" | \
+sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
+.nr pdf-wid (p;\\1)\\n\
+.nr pdf-ht  (p;\\2)/' \
+> /tmp/pdfpic\n[$$]
+.  so /tmp/pdfpic\n[$$]
+.  sy rm /tmp/pdfpic\n[$$]
+.  ec


The piece that isn't portable sed(1) is this bit:


.nr pdf-wid (p;\\1)\\n\


That '\\n' does not go into the shell command as an escape newline, but as the
character sequence '\', 'n' (or, in C, '\\', 'n'), which is a GNU sed
extension that the sed in macOS 12 also supports.  But other seds don't.

> given that the example pdf provided in bug #58206 was in fact an invalid
pdf, which is why pdfinfo did not handle it correctly.

That may be true, but I was able to reproduce the problem using conventional
Linux tools, and did so in my regression test.

https://git.savannah.gnu.org/cgit/groff.git/tree/tmac/tests/pdfpic_does-not-choke-on-bad-pdfinfo-output.sh?h=1.23.0.rc4#n68

...unless you're saying that my contrived /Title annotation is similarly
defective in lacking a byte order mark, which you might be.

Maybe we should test both, since evidently a lot of PDFs have been produced by
faulty tools that omitted the BOM.

> So, if it makes things any easier we could go back to a simple grep,

I'm afraid we can't.  All this byte order mark business is a side issue from
the mission of the PDFPIC macro's call to `sy`: to scrape the image dimensions
out of pdfinfo(1)'s output into two separate *roff registers.

Even if nobody ever produced PDFs with BOMs missing from their encoded text
strings, or pdfinfo(1) never behaved badly (from our perspective) if they did,
we'd still need to turn one line of pdfinfo output like this:


Page size:      612 x 792 pts


into valid (g)roff syntax--and that's going to take more than grep(1).

Does this clear things up?




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64061>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]