bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-ocrad] The function ignore_wide_blobs() doth ignore too much, methi


From: Tilman Hausherr
Subject: [Bug-ocrad] The function ignore_wide_blobs() doth ignore too much, methinks
Date: Tue, 24 Aug 2010 17:29:13 +0200 (CEST)

Hello Antonio,

I researched the issue why, for some images with tables and grey (noisy)
areas, OCRAD returns no text at all, although some of the texts are in
clean white areas. I was able to focus on a part in ignore_wide_blobs(),
which apparently decides about whether a wide blob is an "image" (I
assume you mean a photograph) or a frame. In my case, the function makes
a "wrong" decision and then completely deletes blobp_vector. The "wrong"
turn seems to be at 

        if( blobs <= b.size() / 400U)

some output (the third line is by me):

file type is P4
file size is 1653w x 2338h
blobs = 26566, b[0,0,1652,2337].size() = 3864714, b.size() / 400U = 9661

That blob is that large because the test image that I created
(meaningless excel table, printed and then scanned) has a black border
at the right and at the bottom. 

Changing 400U to a lower value (like 100) didn't help; at a later time,
the variable "blobs" still has a similar value as above, but the size()
of b gets smaller. Thus, the more complex the image is, the more likely
there's a risk that it gets ignored.

Commenting out the "if" line does solve the problem with the test image,
obviously - but what are the risks? Getting a lot of useless output? Or
losing on speed?

The test image I created is available on request.

Tilman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]