[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-ocrad] Technical documentation summary readme.txt, page skew, Hough
From: |
Chris K. Skinner |
Subject: |
[Bug-ocrad] Technical documentation summary readme.txt, page skew, Hough transform. |
Date: |
Thu, 23 Feb 2006 12:35:05 -0500 |
This will be long, so please be patient and see if you can read all of this
...
I'm interested in various aspects of computer science including image
processing, neural networks,
expert systems, semantic analysis. I've got books on such topics.
As you are probably aware, there are software patents in some countries.
If you had some kind of outline of the algorithms that were applied per
version of the software that would greatly help someone new coming in fresh
off the street to gain a quicker understanding of stuff in general, and
probably demonstrate to the world at large that you have invented something
new that could not be patented / stolen / claimed by some greedy corporate
dudes.
I have just downloaded and tried to compile your source, but it failed. Now
to understand your work, I have to look into each source to try to realize
what is being done now. To see what was done in previous versions that
either did not work as intended or was tried and abandoned, I would have to
repeat this analysis on your older version sources, and compare it to the
current version.
Do you have any design notes, bibliographic citations, web links to
information that you've made use of , release notes for what algorithms are
being used / abandoned.
I've been using the following OCR software since about 1993:
WinFax, Calara WordScan Plus, Caere OmniPage.
From my experience with these, the amount of OCR errors goes way up if the
page skew (orientation, angle, rotation, rotate) is not exactly aligned to
zero degrees. When the angle is off, then the bounding boxes around each
page element, each column, each line of text, each character is wrongly
positioned to create huge amounts of recognition errors. Consider that when
a high resolution scan is done that recognition should probably improve
because the information is rich with nice amounts of redundant information
clues as to what is present on the page.
But the long horizontal lines of text then become very long "sets of stripes
of pixels." With such long stripes, it is more likely that instead of there
being a one or two pixel error from page skew, it can be much higher. If
the recognition algorithms do not account for this, and instead determine
bounding box regions for recognition too early and presume a zero page skew
angle error, the results shall be/are very bad.
In the J. R. Parker book w/CD ROM "Algorithms For Image Processing And
Computer Vision" that I have read, the author provides a couple of algorithm
suggestions for combating the page skew angle issue. A Hough-transform when
applied to the dots of the bottoms of the bounding boxes of glyphs results
in a page skew angle in degrees (with his source code, that is). By
applying an image rotation that eliminates the skew, better recognition
shall result. (Unfortunately, he does not, however, present the source code
for determining the bounding boxes of glyphs so that it is not easy to
demonstrate that this algorithm will work especially on larger regions of
text.)
Another approach is to use angle-independent Complex-Number-Coefficient
Neural Networks to use as feature recognizers. The Japanese promoter of
these neural networks says that they are Affine-Transform insensitive, and
thereby can recognize a pattern that has been so transformed.
http://mathworld.wolfram.com/AffineTransformation.html
http://www.google.ca/search?num=20&hl=en&newwindow=1&safe=off&q=Affine
"
http://www.google.ca/search?num=20&hl=en&newwindow=1&safe=off&q=Affine+Complex-Number+Coefficient+Neural+Networks
"
This too is just a theory. I don't have a copy of any books on
Complex-Number-Coefficient Neural Networks, or any source code from a
competent mathematician who has converted the advanced mathematics into
working C++ code examples. Often these theoreticians are not interested in
the practical applications of their work and are more interested in the
expressions of their ideas as continuous functions expressed as
N-dimensional differential equations (or something much less understandable
to me anyway).
Thanks for any help that you could provide me in helping understand your
project so that I might possibly provide you with suggestions for
improvements.
Kindest regards, C.
- [Bug-ocrad] Technical documentation summary readme.txt, page skew, Hough transform.,
Chris K. Skinner <=