lilypond-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1


From: William Bader
Subject: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
Date: Tue, 19 Sep 2017 15:11:07 +0000

>It would be possible to write a tool which could reliably detect identical 
>fonts in a PDF file,


There are already libraries that can read PDFs into a data structure and then 
write a new PDF, for example, pdfsizeopt in python, poppler 
https://poppler.freedesktop.org/ and PoDoFo  
http://podofo.sourceforge.net/about.html in C++, pdfclown 
https://sourceforge.net/projects/clown/  in .net, PDFBox 
http://pdfbox.apache.org/ in java, iText https://itextpdf.com/ in java and c#, 
pdfsam http://www.pdfsam.org/ in java. Maybe one of them would be suitable as a 
starting point for writing a font merging tool.







________________________________
From: Ken Sharp <address@hidden>
Sent: Tuesday, September 19, 2017 10:03 AM
To: David Kastrup
Cc: William Bader; address@hidden; address@hidden
Subject: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

At 15:44 19/09/2017 +0200, David Kastrup wrote:


>Are there any example documents with thousands of pages and ten
>thousands of PDF inclusions one could look at?

I would suggest that the fact you want to 'include' tens of thousands of
PDF files to be the problem, really.

I appreciate you are trying to deal with an existing problem, but using
Ghostscript to do something it wasn't intended for isn't really the best
idea for solving the problem.

As I've said elsewhere there is a genuine bug which can be exposed doing
what you want with Ghostscript and it would not surprise me if in the long
run it causes you another problem.

It would be possible to write a tool which could reliably detect identical
fonts in a PDF file, remove the duplicates and alter the references so that
the PDF continued to work. In all honesty, if the problem is as important
as you say, this is probably a better solution. A tailored program,
specifically designed to solve a specific problem is much more likely to
work reliably than trying to use a general purpose program, designed for a
different problem.

That said, it would be quite a big job, and I'm not actually offering to
take it on.

My suggestion, which may not be feasible, is to keep everything in an
editable format until the last second

This is extracted from an email I decided earlier not to send:
-----------------------------------------------------------------------------

While I can tell you a lot about PostScript and PDF I can't help you at all
with TeX. In general, however, my experience of working with large
documents is that the content should be maintained in the layout
application native format until the last moment. Broadly speaking this is
similar to keeping bitmap data in something like TIFF and only converting
to JPEG at the last moment, and for similar reasons.

When you create a PDF you are discarding all the 'metadata' that describes
the layout to the typesetting or layout application. Its all but impossible
to recover that information once its been lost.

Your problem with multiple fonts pretty much exhibits that; once you've got
the PDF file, a layout engine can't tell that all the fonts are the same.
Ghostscript can't either, which is why it now doesn't strip the duplicates
out. While I appreciate this is a problem for your particular use case, it
is actually a considerable improvement for users in general.

Assuming that you are using TeX throughout for your documentation, then it
seems to me that you should be creating your final document by appending
the various TeX documents together and then producing a final PDF, instead
of appending multiple PDF files.

Presumably you want to show some parts of Lilypond as well, so I would
create EPS figures for those. It will of course increase the number of font
inclusions again, but in the case of Lilypond I don't think that you can be
merging the fonts anyway, because Lilypond always uses glyphshow, and
pdfwrite will create a uniquely named font for each usage. So you aren't
gaining any benefit from exploiting the Ghostscript bug with the Lilypond
output.

So by maintaining the text and layout in TeX, inserting EPS figures as
required, and only producing PDF as the last step in the process you would
create a file which (as I understand it) would only contain a single
instance of each font.

in short I'm not really suggesting that you change anything except your
working practices, and maintain your files as TeX files rather than as PDF.
Because I don't have any knowledge of your workflow (or TeX) I cannot say
if this is reasonable, it may well not be.


                 Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]