Open Source Content Analysis Toolkit
By admin on Mar 3, 2011 in Java, open source
Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.
Tika is a project of the Apache Software Foundation, and was formerly a subproject of Apache Lucene.
Sorry, comments for this entry are closed at this time.