[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Jonathan Monsarrat
Subject: PostScript-to-ASCII
Date: Wed, 10 Nov 93 22:50:46 -0500


There aren't any good PostScript to ASCII converters. I'd know, I maintain
the PostScript FAQ, which sez:

    3.9 How can I convert PostScript to ASCII? 
    In general, when you say ``I want to convert PostScript to ASCII'' 
    what you really mean is ``I want to convert MacWrite (which makes 
    PostScript output) to ASCII'' or ``I want to convert somebody's TeX 
    document (which I have in PostScript) to ASCII''. 
    Unfortunately, programs like these (if they're smart) do a lot of 
    fancy stuff like kerning, which means that where they would 
    normally execute the postscript command for 
      ``print water fountain''
    instead they execute the postscript command for 
      ``print wat''      (move a little to get the spacing *just* right)
      ``print er''       (move a little to get the spacing *just* right)
      ``print foun''     (move a little to get the spacing *just* right)
      ``print tain''     (move a little to get the spacing *just* right)
    So if I write a program to look through a PostScript file for 
    strings, like ps2ascii.pl, It can't tell where the words really 
    end. Here my program would see 4 strings 
  ``wat'' ``er'' ``foun'' ``tain''
    And it doesn't see any difference between the spacing between 
    ``found'' and ``tain'' (not a word break) and the spacing between 
    ``er'' and ``foun'' (a real word break). 
    The problem is that PostScript for text formatting is usually 
    produced machine generated by a text formatter. A PostScript 
    generator like dvips might have a special command like ``boop'' 
    that differentiates between a real world break and a fake one. But 
    every text formatter that generates PostScript has their own name 
    for the ``boop'' command. 
    So you really want a ``PostScript to ASCII converter for dvips 
    The only general solution I can see would be to redefine the show 
    operator to print out the currentpoint for every letter being 
    printed, like gs2asc, and then make up an ASCII page based on this 
    by sticking ASCII characters where they go in a two-dimensional 
    array. That would convert PostScript to ASCII ``formatted''. 
    But even that wouldn't solve the problem, because special bitmap 
    fonts and and standard fonts like Symbol don't always print a ``P'' 
    when you say the letter ``P''. Sometimes they print the greek Pi 
    symbol or a chess piece or a ZapfDingBat. 
    Use ps2a, ps2ascii, ps2txt, ps2ascii.ps or ps2ascii.pl. 

If anybody wants these programs, ask me (or see FAQ for more info).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]