[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

pdfmark: need a method to sanitize text in document outlines

From: Keith Marshall
Subject: pdfmark: need a method to sanitize text in document outlines
Date: Mon, 2 Aug 2021 16:30:47 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

I don't recall noticing this before, (probably an oversight on my part),
but if I regenerate pdfmark.pdf today, from, I see several
document outline entries similar to:

  The F[C]pdfmarkF[] Operator

In this, the (unwanted) F[C] and F[] appear to be artefacts from the
likes of:

  .NH 2
  .XN The \F[C]pdfmark\F[] Operator

where XN is a locally defined macro which emits its entire argument list
as the text for the numbered heading, while also constructing a table of
contents entry, and a document outline entry, from the same arguments.

Clearly, the formatting escape sequences need to be filtered out of (a
copy of) the argument list, before passing it to the pdfbookmark macro.
My first (naïve) idea was to capture into a diversion, asciify, chop,
and convert to string:

  .de sanitize
  .   ds \\$1
  .   als sanitize:result \\$1
  .   shift
  .   di sanitize:result.div
  .      nop \\$*
  .      br
  .   di
  .   asciify sanitize:result.div
  .   chop sanitize:result.div
  .   as sanitize:result "\\*[sanitize:result.div]\"
  .   rm sanitize:result.div sanitize:result

  .de XN \" partial implementation
  .sanitize xn*bookmark.text "\\$*"
  .pdfbookmark \\n[nh*hl] "\\*[xn*bookmark.text]"
  .nop \\$*

That does remove the effect of the \F escapes when the result is
inserted into running text; however, it isn't aggressive enough,
(and indeed, is harmful), for text passed to pdfbookmark:

  $ GROFF_TMAC_PATH=. pdfroff -mspdf \
  > -dpaper=a4 -P-pa4 > pdfmark.pdf
  troff: Failed assertion at line 524, file 'src/roff/troff/input.cpp'.
  grops:<standard input>:28926: warning: no final 'x stop' command
  /usr/bin/groff: troff: Signal 6 (core dumped)

That is an assertion failure at line 524 of input.cpp, as it was when
tagged for release 1.22.4:

  $ hg cat -r 1.22.4 ../../src/roff/troff/input.cpp | sed -n 524p
    assert(level == 0);

Clearly, the non-text nodes, which asciify doesn't remove, have a
destructive effect when passed to pdfbookmark.  I can circumvent the
issue by using a macro such as in the attached sanitize.tmac, but is
there a more elegant alternative?

Regards, Keith.

Attachment: sanitize.tmac
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]