bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #65585] Problems introduced by commit cd9fde325f


From: Deri James
Subject: [bug #65585] Problems introduced by commit cd9fde325f
Date: Fri, 12 Apr 2024 11:51:09 -0400 (EDT)

URL:
  <https://savannah.gnu.org/bugs/?65585>

                 Summary: Problems introduced by commit cd9fde325f
                   Group: GNU roff
               Submitter: deri
               Submitted: Fri 12 Apr 2024 03:51:08 PM UTC
                Category: Driver gropdf
                Severity: 3 - Normal
              Item Group: Incorrect behaviour
                  Status: In Progress
                 Privacy: Public
             Assigned to: deri
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Fri 12 Apr 2024 03:51:08 PM UTC By: Deri James <deri>
I have been looking at the current state of pdf.tmac and have found a few
issues. The changes which have been committed are:-

Introduction of a new flag, -S, which replaces my change which introduced the
concept of passing a single pipe character as a hotspot meant that the actual
text to form the actual hotspot would be "piped" to the document stream,
terminated by doing a markstop. This is what allowed Branden to implement the
man macros which place contents into a diversion before emitting a hotspot. I
have no objection to this change, although there were several ways of setting
up a hotspot containing a single pipe character before this change, and most
users are probably familiar with piping data. (Perhaps I should have used "<"
as the single character). I would be very annoyed if someone sent me a pdf
with a single "|" as a hotspot - would take me about 5 minutes to click it
with the mouse!

Another change was an attempt to only allow a restart after a preceding
suspend. This is a good idea, but it is in the wrong place, it should be
implemented in gropdf rather than pdf.tmac, since a user may use \X'pdf:
markrestart' as documented in the gropdf man page. Which bypasses the check in
pdf.tmac. 

Branden has implemented a looping construct to hold tags replacing an
associative array. It is not used if the user is using mom macros (it does not
work for mom documents). In fact it only works reliably for the man macros
producing a "book" of man pages, where it is used to differenciate between an
internal link to another page in the book or a "man:" URL to an external man
page. The problem is that the new code does not implement:-

.pdfhref L -D name

Which should emit the descriptive text associated with the named tag at the
time it was defined, but instead it just emits the "name". This is documented
in pdfmark.pdf, and used several times in pdfmark.ms, so it is reasonable to
assume users may be using this in their own documents if they use "pdfmom
--roff  -mspdf". So it is wrong to assume the new code is suitable for
everything except mom.

There is also an issue with the speed of the new code. One test file I used
went from an elapsed time of 0m3.08s to 11m31.62s. (About 670 times slower!).
This was producing a large document (LinuxManBook) from a single file, using
the command:-

time ~/groff-git/groff/test-pdfmom  --roff -Tpdf -mandoc -petk
LinuxManBook.trf -z

(test-pdfmom is the same as test-groff but calling the pdfmom in the build
directory). I don't know whether this slow down in groff will be acceptable to
most users, I'm meant to be on a sabatical so I am not going to argue either
way, but be aware that is Branden's attempt to solve one issue which was
solved months ago in my branch waiting for merging. My use of stringhex was
described as "obfuscation", which is rather insulting if the intention was to
imply a deliberate attempt to obscure the purpose of my code. The correct
interpretation of its use is to protect the data from interpretation by groff
so that any byte sequence can be used as a string name in groff. This is
analagous to using base64 to protect binary data in emails.

Although Branden's loopy code solves some issues with 1.23.0 it fails in a
number of ways which are dealt with successfully by the pdf.tmac in my branch
(i.e. t.trf in #64576).

There is a bug in Branden's code. The attached file, pdf-L.trf, illustrates
the issue. According to the documentation in pdfmark.pdf, if you use .pdfhref
L with a destination name but no descriptive text, the descriptive text given
when the destination is named is used. With Branden's code, instead of using
the descriptive text it uses the name of the destination instead. The two pdfs
called pdf-L illustrate the problem.

Another bug results in entries in the array Branden loops over get
over-vritten in certain circumstances. This code illustrates the bug.

.ig
  groff -Tpdf -dPDF.EXPORT=1 -z pdf-M.trf

Results:-

.ds pdf:bm1.tag one
.ds pdf:bm1.tag two

(The bm1.tag has been overwritten!!!)

But,

  groff -Tpdf -dPDF.EXPORT=1 -dPRINTSTYLE=1 -z pdf-M.trf

Results:-

.ds pdf:look(one) Once upon...
.ds pdf:look(two) This is two

..
.pdfbookmark -T one 1 Once upon...
.pdfhref M -D two -E This is two

Setting "PRINTSTYLE=1" bypasses Branden's changes, because he uses the
original code if mom macros are used. I don't understand why it is desirable
to have two separate methods. I have fixed all the problems listed above, plus
a speed up of Branden's loopy code. I have also included Keith's clever
solution to polluting the pdf flags with data which can cause the errors about
illegal characters in identifier names if the optional "--" marker not used.

There is a minor regression in this code, not caught by any of the check
tests, but when producing our pdf documents you will notice these new
warnings:-

gropdf:man/groff.7: warning: Can't convert 'rs' to unicode
gropdf:../contrib/mom/examples/sample_docs.mom: warning: Can't convert 'em' to
unicode

They are caused because gropdf currently has no way of knowing what the UTF-16
for these groff characters. Previously the "em" character was handled by a:-

.tr \[em]-

In pdf.tmac. I was never happy with this "fudge" and the correct handling was
part of my branch. I have changed the method slightly in view of what Branden
wrote on the list. Afmtodit does not add an extra column to the font files it
creates but adds the UTF-16 code as a comment. The groff_font man page has
been amended appropriately. The grops standard fonts have also had the comment
added, which corrects the regression.

I am enjoying my sabbatical away from groff, but I thought I should correct
these recent changes, since they would cause issues if released, but I would
be grateful if other major changes to pdf production could wait until I am
back, fighting fit to do any QA. Of course, if you find any problems with the
code, please let me know through this bug report.







    _______________________________________________________
File Attachments:


-------------------------------------------------------
Name: pdf-L.trf  Size: 2KiB
<https://file.savannah.gnu.org/file/pdf-L.trf?file_id=55936>
-------------------------------------------------------
Name: pdf-L-DJ.pdf  Size: 3KiB
<https://file.savannah.gnu.org/file/pdf-L-DJ.pdf?file_id=55937>
-------------------------------------------------------
Name: pdf-L-GBR.pdf  Size: 3KiB
<https://file.savannah.gnu.org/file/pdf-L-GBR.pdf?file_id=55938>

    AGPL NOTICE

These attachments are served by Savane. You can download the corresponding
source code of Savane at
https://git.savannah.nongnu.org/cgit/administration/savane.git/snapshot/savane-4448d8da181e3133acac16b178e98513aa6405cd.tar.gz

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65585>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]