Current Article

Open Source Text Analytics

By admin on Mar 14, 2009 in Java, open source

GATE – General Architecture for Text Engineering

the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, a leading toolkit for Text Mining
used worldwide by thousands of scientists, companies, teachers and students
comprised of an architecture, a free open source framework (or SDK) and graphical development environment
used for all sorts of language processing tasks, including Information Extraction in many languages
funded by the EPSRC, BBSRC, AHRC, the EU and commercial users
100% Java reference implementation of ISO TC37/SC4 and used with XCES in the ANC
10 years old in 2005, used in many research projects and compatible with IBM’s UIMA
based on MVC, mobile code, continuous integration, and test-driven development, with code hosted on SourceForge

Apache UIMA

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.

UIMA enables applications to be decomposed into components, for example “language identification” => “language specific segmentation” => “sentence boundary detection” => “entity detection (person/place names etc.)”. Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages.

RapidMiner

RapidMiner (formerly YALE) and its plugins provide more than 400 operators for all aspects of Data Mining. Meta operators automatically optimize the experiment designs and users no longer need to tune single steps or parameters any longer. A huge amount of visualization techniques and the possibility to place breakpoints after each operator give insight into the success of your design – even online for running experiments. On this page we discuss the main groups of operators and give operator examples for each of the groups.

NTLK

NLTK is an open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.

OpenNLP

OpenNLP provides the organizational structure for coordinating several different projects which approach some aspect of Natural Language Processing. OpenNLP also defines a set of Java interfaces and implements some basic infrastructure for NLP components.

R Text Mining

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R Text Mining package can be used for text analysis.

Trackback URL

Sorry, comments for this entry are closed at this time.

twit88.com

Current Article

Open Source Text Analytics

GATE – General Architecture for Text Engineering

Apache UIMA

RapidMiner

NTLK

OpenNLP

R Text Mining

Subscribe

Recent Posts

Categories

Archives

twit88.com

Current Article

Open Source Text Analytics

GATE – General Architecture for Text Engineering

Apache UIMA

RapidMiner

NTLK

OpenNLP

R Text Mining

Related Posts

Subscribe

Recent Posts

Categories

Popular Posts

Archives