Current Article

Open Source OCR Engine

By admin on Dec 28, 2009 in open source

Tesseract OCR

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.

Supported platforms

Ubuntu 6.06 (x86/32, x86/64)
Ubuntu 6.10 (x86/32, x86/64)
Windows (x86/32) with Visual C++ Express 2008

OCRopus

OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90’s and deployed by the US Census bureau, and novel high-performance layout analysis methods.

OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

Trackback URL

1 Comment(s)

ksi124 | Feb 17, 2010 | Reply

OCR

Sorry, comments for this entry are closed at this time.

twit88.com

Current Article

Open Source OCR Engine

Tesseract OCR

OCRopus

1 Comment(s)

Subscribe

Recent Posts

Categories

Archives

twit88.com

Current Article

Open Source OCR Engine

Tesseract OCR

OCRopus

Related Posts

1 Comment(s)

Subscribe

Recent Posts

Categories

Popular Posts

Archives