[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-ocrad] The function ignore_wide_blobs() doth ignore too much, methi
From: |
Tilman Hausherr |
Subject: |
[Bug-ocrad] The function ignore_wide_blobs() doth ignore too much, methinks |
Date: |
Tue, 24 Aug 2010 17:29:13 +0200 (CEST) |
Hello Antonio,
I researched the issue why, for some images with tables and grey (noisy)
areas, OCRAD returns no text at all, although some of the texts are in
clean white areas. I was able to focus on a part in ignore_wide_blobs(),
which apparently decides about whether a wide blob is an "image" (I
assume you mean a photograph) or a frame. In my case, the function makes
a "wrong" decision and then completely deletes blobp_vector. The "wrong"
turn seems to be at
if( blobs <= b.size() / 400U)
some output (the third line is by me):
file type is P4
file size is 1653w x 2338h
blobs = 26566, b[0,0,1652,2337].size() = 3864714, b.size() / 400U = 9661
That blob is that large because the test image that I created
(meaningless excel table, printed and then scanned) has a black border
at the right and at the bottom.
Changing 400U to a lower value (like 100) didn't help; at a later time,
the variable "blobs" still has a similar value as above, but the size()
of b gets smaller. Thus, the more complex the image is, the more likely
there's a risk that it gets ignored.
Commenting out the "if" line does solve the problem with the test image,
obviously - but what are the risks? Getting a lot of useless output? Or
losing on speed?
The test image I created is available on request.
Tilman
- [Bug-ocrad] The function ignore_wide_blobs() doth ignore too much, methinks,
Tilman Hausherr <=