[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-ocrad] internal error: insert_space, track not set yet.
From: |
Tilman Hausherr |
Subject: |
Re: [Bug-ocrad] internal error: insert_space, track not set yet. |
Date: |
Tue, 17 Aug 2010 18:20:53 +0200 |
On Tue, 17 Aug 2010 15:57:36 +0200, Antonio Diaz Diaz wrote:
>Tilman Hausherr wrote:
>> Why not accept that some images might really have some very high and
>> very small characters? Its not that unlikely, e.g. with advertisements:
>> "free beer coupon *" in huge characters, and "* not valid in
>> Lampukistan" in very small characters. If you make a real change,
>> there's always the risk that you'd get worse results for the majority of
>> images while solving a problem that almost never happens. Maybe a
>> solution would be that if there are no medium characters, to just add
>> one element that produces a space...
>
>You mean if the high characters are grouped put them in a line and the
>short characters in another line? I guess this can be implemented.
Yes, although my text above is purely theoretical. Currently I have
concentrated on processing the results that I get; I'd done almost no
evaluation about the quality of the OCR.
>> On the other hand, I just thought of another "symptom" fix, and it
>> works:
>
>Yes, this works, but as a definitive solution I prefer to remove lines
>which only contain noise.
Yeah, that would be nice.
I have observed - although not yet researched fully - that sometimes,
noise lines between "good" text lines ==> this text not being ocred at
all. This happens with images that have grey areas, and these areas,
when scanned, sometimes look like a chess board. But I need to do more
research there.
Tilman
>
>
>Regards,
>Antonio.
>
>_______________________________________________
>Bug-ocrad mailing list
>address@hidden
>http://lists.gnu.org/mailman/listinfo/bug-ocrad