RSS Feed for This PostCurrent Article

Open Source OCR Engine

Tesseract OCR

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text. A tiff reader is built in that will read uncompressed TIFF images, or libtiff can be added to read compressed images.

Supported platforms

  • Ubuntu 6.06 (x86/32, x86/64)
  • Ubuntu 6.10 (x86/32, x86/64)
  • Windows (x86/32) with Visual C++ Express 2008


OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90’s and deployed by the US Census bureau, and novel high-performance layout analysis methods.

OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

Trackback URL

RSS Feed for This Post1 Comment(s)

  1. ksi124 | Feb 17, 2010 | Reply


Sorry, comments for this entry are closed at this time.