Open Source OCR Engine
By admin on Dec 28, 2009 in open source
Tesseract OCR
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.
Supported platforms
- Ubuntu 6.06 (x86/32, x86/64)
- Ubuntu 6.10 (x86/32, x86/64)
- Windows (x86/32) with Visual C++ Express 2008
OCRopus
OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90’s and deployed by the US Census bureau, and novel high-performance layout analysis methods.
OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.
ksi124 | Feb 17, 2010 | Reply
OCR