swftools-common
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Swftools-common] Re: pdf2swf textSnapshots in OCR'ed PDF files


From: tachy0n tachy0n
Subject: Re: [Swftools-common] Re: pdf2swf textSnapshots in OCR'ed PDF files
Date: Fri, 7 Aug 2009 08:04:59 -0700

Matthias, thanks for your reply and thank you for creating such a useful tool.

It looks like this problem is not limited to OCR'ed PDF files. When I
tried it on a Powerpoint and an Excel file converted to PDFs, I got
the same problem were the textsnapshot.findText returns the correct
index, but the highlighting never happens. With the PDF produced from
a Powerpoint file (that contained a background jpeg image for all
slides), the pdf2swf program printed a "NOTICE  File contains jpeg
pictures" and in the Excel file case, it printed "NOTICE File contain
pbm pictures". The former NOTICE of file contains jped pictures is
what I get in OCR'ed PDF files also. The PDF version of all these
files are searchable and highlightable using xpdf and Adobe Acrobat
reader.

Any idea why when these notices occur during conversion, the text
highlighting fails in the viewer?

Has anyone else faced a similar problem?

When searching the mailing list I came across this thread -
http://lists.gnu.org/archive/html/swftools-common/2008-07/msg00061.html
- which seems to address the problem that the OCR'ed text was laid
visibly on top of the jpeg pictures. Could a fix for this somehow
changed the way how all background images are handled?

On Fri, Aug 7, 2009 at 4:22 AM, Matthias Kramm<address@hidden> wrote:
> On Wed, Aug 05, 2009 at 10:13:11AM -0700, tachy0n tachy0n <address@hidden> 
> wrote:
>> I'm not sure if my previous mail went to the list as I had not
>> subscribed to the mailing list when I sent it. So I'm resending it
>> again. Any pointers in the right direction would be helpful.
>>
>> I've tried changing the z-order of the Static Text to be above the
>> Shapes (Bitmaps) and set the alpha for the static text to 0 so that
>> just the highlights will be visible, but that shows a solid yellow
>> highlight on top of the bitmap for the selected text.
>
> I think that's actually the best you'll get. The problem you have
> with OCR'd files is that they contain two characters for every
> character they recognized:
> One as bitmap, for displaying, one (invisible) box character for
> selection, on top of the bitmap.
>
> When flash selects a character it'll invert it. As the
> invisible boxes are empty, you'll get solid yellow rectangles when
> selecting text.
>
> It's weird that your OCR solution puts the transparent box characters
> below the bitmap, though.
>
> Matthias
>
>
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]