groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] U+0027, U+002D, and U+0060 in code examples?


From: Deri James
Subject: Re: [Groff] U+0027, U+002D, and U+0060 in code examples?
Date: Sun, 6 May 2012 17:45:07 +0100
User-agent: KMail/1.13.7 (Linux/2.6.38.8-desktop-10.mga; KDE/4.6.5; x86_64; ; )

On Sunday 06 May 2012 12:01:02 Werner LEMBERG wrote:
> Ideally, there should be a proper ToUnicode cmap in the PDF so that
> copy and paste gives good results.  On the PostScript side, it should
> be theoretically possible to use the `GlyphNames2Unicode' dictionary
> (an undocumented Adobe Distiller extension) so that PS->PDF software
> can provide non-standard mappings.  Right now, I haven't found a full
> example code for that.
> 
> However, the gropdf driver could directly add support for that...
> Deri?
> 
> 
>     Werner

Werner,

I've just had a cursory look at the pdf reference (1.4 - the one with a proper 
index!) and it looks like the example 
given on page 371 is very close to what we would need. I think this cmap could 
be used with groff encoding as given in 
text.enc:-

/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<0000> <FFFF>
endcodespacerange
2 beginbfrange
<0020> <007f> <0020>
<008b> <008f> [<00660066> <00660069> <0066006c> <006600660069> <00660066006C>] 
<- ligatures at 139-143
<00ad> <00ad> <002d>  <- change minus to hyphen
endbfrange
endcmap
CMapName currentdict /CMap defineresource pop
end
end

Perhaps not ideal since this is tied to 'text.enc', in fact this ToUnicode CMAP 
is only embedded if the Groff font 
specifies encoding as 'text.enc' in its font file. Is there a better way of 
doing this? Are there other codes which 
should also be mapped here? (Quotes?)

(NB I still intend to use code from 'dvipdfmx' to do the font subsdetting at 
some point, which I believe includes a 
ToUnicode CMap, so this is a temporary solution.)

Attached is a small pdf showing this cmap in use. Generated from:-

.sp 1i
Finally we finished ffirst playing the flute.
.br
Now we test \- minus.

It should copy and paste expanding the ligatures (NB TR font does not have ffi 
or ffl defined), and also search properly  
when viewing the pdf.

Cheers

Deri

Attachment: fi.pdf
Description: Adobe PDF document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]