RSS Feed for This PostCurrent Article

Open Source Content Analysis Toolkit

Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.

Tika is a project of the Apache Software Foundation, and was formerly a subproject of Apache Lucene.


Trackback URL

Sorry, comments for this entry are closed at this time.