[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-ocrad] Support for hOCR
From: |
Raffaele Sena |
Subject: |
[Bug-ocrad] Support for hOCR |
Date: |
Wed, 22 Jan 2014 16:59:42 -0800 |
Hi,
I just tried ocrad as an alternative to tesseract and I have to say that I
am impressed! (and much easier to understand). But for a recent project I
needed at least partial hOCR support (
https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview),
since I built some external tools based on the fact I could get it from
tesseract.
Based on the ORF stuff, that offer somewhat similar functionalities, I have
added an --hocr={filename} option and implemented basic support (page,
lines and words).
If you are interested I can submit a patch against ocrad-0.23-pre2. It may
use a little refactoring (one option is to add an hocr class that
implements at least the generation of hocr tags) but it works.
Thanks, and keep up with the great work!
-- Raffaele
- [Bug-ocrad] Support for hOCR,
Raffaele Sena <=