Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

From:	Ken Sharp
Subject:	Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
Date:	Tue, 19 Sep 2017 13:30:04 +0100

At 13:42 19/09/2017 +0200, David Kastrup wrote:

So the mechanisms mostly out of our own control are Ghostscript in its
ps2pdf facility, various TeX engines when including lots of
ps2pdf-generated PDF files into a main document.

To me this is where the problem lies, PDF is good as a terminal documentformat, and that was its original aim. Its not good as an intermediateformat, or for inclusion in more complex documents.

I feel the correct answer to this is not to use PDF as an intermediateformat, it seem to me you should stick with a typesetting format becausethat allows you to determine that fonts which are named the same, are infact the same, and you don't need to include them multiple times. In factfor a layout format, you wouldn't normally include the actual fonts at all,of course.

  For this use case, we
want a process that avoids excessive font duplication.  The process so
far involved an additional Ghostscript run removing most of the
duplicates from the TeX-generated PDF (someone please correct me if I
got this wrong).

This only works because all the PDF files you are using (so far) embed thewhole font, don't use subsets, and use the same Encoding (or use differentnames so that they are clearly different fonts). Were you to start usingPDF files (from whatever source) where that is not the case, and I quotedOpenOffice as an example, then you might run into the problem with the bugyou are exploiting.

By not using the PDF object number as a unique identifier, Ghostscript onlyuses the font name. If you get two different fonts (subset or otherwise)Ghostscript will assume they are the same font. If they are differentlyencoded (say that 'A' is encoded at position 0x42 in the first font, but0x42 in the second font has a 'B') then Ghostscript can't tell and willsimply drop the second font.

The result of this is that you will get the wrong text in the output PDFfile. Again, this isn't a theoretical problem, we have had numerous bugreports on this count which we have done our best to work around. In theend there was no alternative but to use the object number as the uniqueidentifier (NB we actually use the object number and the filename, in casewe get two files with the same font using the same object number....)

The only way you find out this has happened is when you carefully read thetext, of course.

We don't really have a way to forego Texinfo for our printed manuals.
Given the comparative importance of TeX for document preparation,
however, I think it would be good to figure out how to keep at least one
viable way open of making this work and figure out a migration path of
the involved tools to how you would optimally would want to have things
working.

I don't think that TeX can (or should) preserve object ids when
including external PDF files, so figuring out some other reasonably
robust identity associated with fonts would seem important.

Well I know nothing about TeX. It seems to me however, that it *must*preserve the object IDs in some sense, because otherwise you wouldn't beending up with multiple copies of fonts. If it didn't preserve the objectnumbers, then it would assume that the first 'Times' is the same as thesecond 'Times' and would collapse them into a single reference. Exactly asyou are using Ghostscript for at present.

If your PDF files contain ToUnicode CMaps then its possible to identifyproperly which glyph is actually intended by each character code in eachfont. Doing that would allow you to optimise the use of fonts, because youcould alter the character coding of each usage so that it was consistentacross the documents and only required a single instance of the font inquestion.

I'd have to experiment to find out, but it would nit surprise me todiscover that when you include a PDF file in TeX what it actually does isconvert it into an EPS or PostScript program and then concatenates all thedocuments together.

That would mean TeX could use PDF files as a kind of 'black box', and wouldmean that the fonts would be included multiple times, just as you say ishappening.

> PDF was never intended as a means of transferring, or 'containerising'
> content, its not trivial (or even possible in general) to extract
> content from, or simplify, PDF files.

And yet I seem to remember Adobe has a specification for how to write
PDF intended for embedding, haven't they?

Err, no, I don't think so. You can embed files untouched (including PDFfiles) inside a PDF, just as other file types. But that's not really what Imeant when I said 'containerising'.

You can also have PDF Collections (I can't recall if that's the correctname) but again that isn't what I meant when I talk about transferringcontent, because you aren't transferring the content, you are including thewhole thing, not just its content.

I was thinking more like writing a .docx file as an RTF or a spreadhseet asa comma separated file. Transferring the content without the associatedcontainer.

Ken

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [gs-devel] Ghostscript/GhostPDL 9.22 Release Candidate 1, Masamichi Hosoda, 2017/09/18
- Re: [gs-devel] Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/18
  - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/18
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp <=
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
    - Re: [gs-devel] Ghostscript/GhostPDL 9.22 Release Candidate 1, William Bader, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Werner LEMBERG, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, William Bader, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
    - Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Karlin High, 2017/09/19

Prev by Date: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
Next by Date: Re: [gs-devel] Ghostscript/GhostPDL 9.22 Release Candidate 1
Previous by thread: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
Next by thread: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
Index(es):
- Date
- Thread