[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
From: |
Ken Sharp |
Subject: |
Re: Ghostscript/GhostPDL 9.22 Release Candidate 1 |
Date: |
Tue, 19 Sep 2017 13:30:04 +0100 |
At 13:42 19/09/2017 +0200, David Kastrup wrote:
So the mechanisms mostly out of our own control are Ghostscript in its
ps2pdf facility, various TeX engines when including lots of
ps2pdf-generated PDF files into a main document.
To me this is where the problem lies, PDF is good as a terminal document
format, and that was its original aim. Its not good as an intermediate
format, or for inclusion in more complex documents.
I feel the correct answer to this is not to use PDF as an intermediate
format, it seem to me you should stick with a typesetting format because
that allows you to determine that fonts which are named the same, are in
fact the same, and you don't need to include them multiple times. In fact
for a layout format, you wouldn't normally include the actual fonts at all,
of course.
For this use case, we
want a process that avoids excessive font duplication. The process so
far involved an additional Ghostscript run removing most of the
duplicates from the TeX-generated PDF (someone please correct me if I
got this wrong).
This only works because all the PDF files you are using (so far) embed the
whole font, don't use subsets, and use the same Encoding (or use different
names so that they are clearly different fonts). Were you to start using
PDF files (from whatever source) where that is not the case, and I quoted
OpenOffice as an example, then you might run into the problem with the bug
you are exploiting.
By not using the PDF object number as a unique identifier, Ghostscript only
uses the font name. If you get two different fonts (subset or otherwise)
Ghostscript will assume they are the same font. If they are differently
encoded (say that 'A' is encoded at position 0x42 in the first font, but
0x42 in the second font has a 'B') then Ghostscript can't tell and will
simply drop the second font.
The result of this is that you will get the wrong text in the output PDF
file. Again, this isn't a theoretical problem, we have had numerous bug
reports on this count which we have done our best to work around. In the
end there was no alternative but to use the object number as the unique
identifier (NB we actually use the object number and the filename, in case
we get two files with the same font using the same object number....)
The only way you find out this has happened is when you carefully read the
text, of course.
We don't really have a way to forego Texinfo for our printed manuals.
Given the comparative importance of TeX for document preparation,
however, I think it would be good to figure out how to keep at least one
viable way open of making this work and figure out a migration path of
the involved tools to how you would optimally would want to have things
working.
I don't think that TeX can (or should) preserve object ids when
including external PDF files, so figuring out some other reasonably
robust identity associated with fonts would seem important.
Well I know nothing about TeX. It seems to me however, that it *must*
preserve the object IDs in some sense, because otherwise you wouldn't be
ending up with multiple copies of fonts. If it didn't preserve the object
numbers, then it would assume that the first 'Times' is the same as the
second 'Times' and would collapse them into a single reference. Exactly as
you are using Ghostscript for at present.
If your PDF files contain ToUnicode CMaps then its possible to identify
properly which glyph is actually intended by each character code in each
font. Doing that would allow you to optimise the use of fonts, because you
could alter the character coding of each usage so that it was consistent
across the documents and only required a single instance of the font in
question.
I'd have to experiment to find out, but it would nit surprise me to
discover that when you include a PDF file in TeX what it actually does is
convert it into an EPS or PostScript program and then concatenates all the
documents together.
That would mean TeX could use PDF files as a kind of 'black box', and would
mean that the fonts would be included multiple times, just as you say is
happening.
> PDF was never intended as a means of transferring, or 'containerising'
> content, its not trivial (or even possible in general) to extract
> content from, or simplify, PDF files.
And yet I seem to remember Adobe has a specification for how to write
PDF intended for embedding, haven't they?
Err, no, I don't think so. You can embed files untouched (including PDF
files) inside a PDF, just as other file types. But that's not really what I
meant when I said 'containerising'.
You can also have PDF Collections (I can't recall if that's the correct
name) but again that isn't what I meant when I talk about transferring
content, because you aren't transferring the content, you are including the
whole thing, not just its content.
I was thinking more like writing a .docx file as an RTF or a spreadhseet as
a comma separated file. Transferring the content without the associated
container.
Ken
- Re: [gs-devel] Ghostscript/GhostPDL 9.22 Release Candidate 1, Masamichi Hosoda, 2017/09/18
- Re: [gs-devel] Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/18
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/18
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1,
Ken Sharp <=
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
- Re: [gs-devel] Ghostscript/GhostPDL 9.22 Release Candidate 1, William Bader, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Werner LEMBERG, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, William Bader, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, David Kastrup, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Ken Sharp, 2017/09/19
- Re: Ghostscript/GhostPDL 9.22 Release Candidate 1, Karlin High, 2017/09/19