groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 46/62: [pdfpic]: Fix Savannah #64061.


From: G. Branden Robinson
Subject: [groff] 46/62: [pdfpic]: Fix Savannah #64061.
Date: Thu, 20 Apr 2023 06:14:39 -0400 (EDT)

gbranden pushed a commit to branch branden-2023-04-20
in repository groff.

commit 4dc2c23f0c2b72517be62e60090f30b214299650
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Mon Apr 17 16:41:33 2023 -0500

    [pdfpic]: Fix Savannah #64061.
    
    * tmac/pdfpic.tmac: Refactor to make comprehensible some woefully
      undocumented cleverness and improve efficiency.
    
      (PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy` usage
      into its own macro, calling from here and relocating its requests from
      here...
    
      (pdfpic*system): ...to here.  When using `sy` request to collect and
      munge output of pdfinfo(1), (a) disable the escape character while
      defining the macro; (b) construct the command in a roff string,
      appending to it in discrete, hopefully comprehensible chunks; (c)
      disable the escape character during macro interpretation wherever
      possible (most of it); (d) retain doubled backslashes so that they
      survive subsequent string interpolation; (e) stop using grep(1) in the
      pipeline when sed(1) is perfectly capable of performing its own input
      filtering; (f) invoke sed with '-n' option and emit output only upon a
      successful substitution; and (g) replace unportable POSIX character
      class '[:digit:]' in substitution matching text with '[0-9]'.
      Annotate portability and escaping challenges.  Tested on GNU/Linux,
      macOS 12, and (with simulated pdfinfo(1) output), on Solaris 11.
      (Solaris still requires GNU sed; keep reading.)
    
    Even with all of that, there is _still_ a problem; the C++ function that
    GNU troff uses to assemble the command string (character by character)
    _does not recognize C/C++ string literal escape sequences_.  This means
    that you _cannot_ embed "\n" in `sy`'s arguments and have it survive, as
    a newline character, into the command string passed to the standard C
    library's system(3) function.  ("A\nB" gets encoded as 'A', '\', 'n',
    'B', not 'A', '\n', 'B'.)  Unfortunately, this appears to be AT&T
    troff-compatible behavior.  But it means that you _cannot_ construct a
    portable multi-line replacement text for sed's 's' command.  (Other sed
    commands like 'a', 'c', and 'i' will be similarly affected.)  We
    therefore (continue to) rely upon a non-standard feature of GNU and
    macOS sed, such that the sequence "\n" in replacement text becomes a
    newline in sed's pattern space.
    
    Fixes <https://savannah.gnu.org/bugs/?64061>.  Thanks to Bruno Haible
    for the report, and to him and Ralph Corderoy for the discussion of
    portable and efficient sed constructs.
---
 ChangeLog        | 43 +++++++++++++++++++++++++++++++++++++++++++
 tmac/pdfpic.tmac | 48 +++++++++++++++++++++++++++++++++++-------------
 2 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 5a1fa6a5b..35defb505 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,46 @@
+2023-04-19  G. Branden Robinson <g.branden.robinson@gmail.com>
+
+       * tmac/pdfpic.tmac: Refactor to make comprehensible some
+       woefully undocumented cleverness and improve efficiency.
+       (PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy`
+       usage into its own macro, calling from here and relocating its
+       requests from here...
+       (pdfpic*system): ...to here.  When using `sy` request to collect
+       and munge output of pdfinfo(1), (a) disable the escape character
+       while defining the macro; (b) construct the command in a roff
+       string, appending to it in discrete, hopefully comprehensible
+       chunks; (c) disable the escape character during macro
+       interpretation wherever possible (most of it); (d) retain
+       doubled backslashes so that they survive subsequent string
+       interpolation; (e) stop using grep(1) in the pipeline when
+       sed(1) is perfectly capable of performing its own input
+       filtering; (f) invoke sed with '-n' option and emit output only
+       upon a successful substitution; and (g) replace unportable POSIX
+       character class '[:digit:]' in substitution matching text with
+       '[0-9]'.  Annotate portability and escaping challenges.  Tested
+       on GNU/Linux, macOS 12, and (with simulated pdfinfo(1) output),
+       on Solaris 11.  (Solaris still requires GNU sed; keep reading.)
+
+       Even with all of that, there is _still_ a problem; the C++
+       function that GNU troff uses to assemble the command string
+       {character by character} _does not recognize C/C++ string
+       literal escape sequences_.  This means that you _cannot_ embed
+       "\n" in `sy`'s arguments and have it survive, as a newline
+       character, into the command string passed to the standard C
+       library's system(3) function.  ("A\nB" gets encoded as 'A', '\',
+       'n', 'B', not 'A', '\n', 'B'.)  Unfortunately, this appears to
+       be AT&T troff-compatible behavior.  But it means that you
+       _cannot_ construct a portable multi-line replacement text for
+       sed's 's' command.  (Other sed commands like 'a', 'c', and 'i'
+       will be similarly affected.)  We therefore (continue to) rely
+       upon a non-standard feature of GNU and macOS sed, such that the
+       sequence "\n" in replacement text becomes a newline in sed's
+       pattern space.
+
+       Fixes <https://savannah.gnu.org/bugs/?64061>.  Thanks to Bruno
+       Haible for the report, and to him and Ralph Corderoy for the
+       discussion of portable and efficient sed constructs.
+
 2023-04-05  G. Branden Robinson <g.branden.robinson@gmail.com>
 
        * tmac/tty.tmac: Add angle bracket fallbacks.
diff --git a/tmac/pdfpic.tmac b/tmac/pdfpic.tmac
index f81b66e7f..5367da75f 100644
--- a/tmac/pdfpic.tmac
+++ b/tmac/pdfpic.tmac
@@ -52,6 +52,40 @@
 .  rr pdfpic*desired-height
 ..
 .
+.\" Get image dimensions.  The `tr` command to strip null bytes is
+.\" distasteful, but its necessity is imposed on us.  See
+.\" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
+.\"
+.\" The following is a circus of portability and escaping constraints.
+.\" See <https://savannah.gnu.org/bugs/index.php?64061>.  Modify it
+.\" only with great caution and after testing with a variety of seds.
+.\" (See the "HACKING" file in groff's source distribution.)  Keep in
+.\" mind that the argument(s) to `sy` are assembled into a C string
+.\" and then passed to system(3), which invokes "sh -c" on the string.
+.\"
+.\" We therefore shut off roff's escape mechanism _twice_: once while
+.\" defining the macro, and once while interpreting part of it, to
+.\" preserve backslashes that we need in the constructed C string.
+.\"
+.\" We _still_ must escape the backslashes in the string appendments to
+.\" keep them from being interpreted as commencing roff escape sequences
+.\" when the string they populate is later interpolated.
+.eo
+.de pdfpic*system
+.  ds pdfpic*command pdfinfo \$1
+.  eo
+.  as pdfpic*command " | tr -d '\\000'
+.  as pdfpic*command " | sed -n -e '/Page *size:/
+.  as pdfpic*command s/Page  *size:  *\\([0-9.]*\\)  *x  *\([0-9.]*\\).*$/
+.  as pdfpic*command .nr pdfpic*width  (p;\\1)\\n
+.  as pdfpic*command .nr pdfpic*height (p;\\2)/p'
+.  ec
+.  as pdfpic*command " > \*[pdfpic*temporary-file]
+.  sy \*[pdfpic*command]
+.  rm pdfpic*command
+..
+.ec
+.
 .de PDFPIC
 .  if !\\n[.U] \{\
 .    pdfpic@error use of \\$0 requires GNU troff's unsafe mode \
@@ -161,19 +195,7 @@ skipping '\\$1'
 .    return
 .  \}
 .  ds pdfpic*temporary-file \\*[pdfpic*temporary-directory]/pdfpic\n[$$]
-.
-.  \" Get image dimensions.  The `tr` command to strip null bytes is
-.  \" distasteful, but its necessity is imposed on us.  See
-.  \" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
-.  ec @
-.  sy pdfinfo @$1 | \
-tr -d '\000' | \
-grep "Page *size" | \
-sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
-.nr pdfpic*width (p;\\1)\\n\
-.nr pdfpic*height  (p;\\2)/' \
-> @*[pdfpic*temporary-file]
-.  ec
+.  pdfpic*system \\$1
 .  if \\n[systat] \{\
 .    pdfpic@error retrieval of '\\$1' image dimensions failed with \
 exit status \\n[systat]; skipping



reply via email to

[Prev in Thread] Current Thread [Next in Thread]