bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-ocrad] internal error: insert_space, track not set yet.


From: Tilman Hausherr
Subject: Re: [Bug-ocrad] internal error: insert_space, track not set yet.
Date: Tue, 17 Aug 2010 18:20:53 +0200

On Tue, 17 Aug 2010 15:57:36 +0200, Antonio Diaz Diaz wrote:

>Tilman Hausherr wrote:
>> Why not accept that some images might really have some very high and
>> very small characters? Its not that unlikely, e.g. with advertisements:
>> "free beer coupon *" in huge characters, and "* not valid in
>> Lampukistan" in very small characters. If you make a real change,
>> there's always the risk that you'd get worse results for the majority of
>> images while solving a problem that almost never happens. Maybe a
>> solution would be that if there are no medium characters, to just add
>> one element that produces a space...
>
>You mean if the high characters are grouped put them in a line and the 
>short characters in another line? I guess this can be implemented.

Yes, although my text above is purely theoretical. Currently I have
concentrated on processing the results that I get; I'd done almost no
evaluation about the quality of the OCR.

>> On the other hand, I just thought of another "symptom" fix, and it
>> works:
>
>Yes, this works, but as a definitive solution I prefer to remove lines 
>which only contain noise.

Yeah, that would be nice.

I have observed - although not yet researched fully - that sometimes,
noise lines between "good" text lines ==> this text not being ocred at
all. This happens with images that have grey areas, and these areas,
when scanned, sometimes look like a chess board. But I need to do more
research there.

Tilman

>
>
>Regards,
>Antonio.
>
>_______________________________________________
>Bug-ocrad mailing list
>address@hidden
>http://lists.gnu.org/mailman/listinfo/bug-ocrad



reply via email to

[Prev in Thread] Current Thread [Next in Thread]