This is one of the Java libraries that I tried out and found quite useful for linguistic analysis of human language.
LingPipe is a suite of Java libraries for the linguistic analysis of human language. The features as quoted from the website.
- track mentions of entities (e.g. people or proteins);
- link entity mentions to database entries;
- uncover relations between entities and actions;
- classify text passages by language, character encoding, genre, topic, or sentiment;
- correct spelling with respect to a text collection;
- cluster documents by implicit topic and discover significant trends over time; and
- provide part-of-speech tagging and phrase chunking.
One of the features that I like is that it can do Chinese word segmentation. Chinese is written without spaces between the words, and it is not a simple task to break Chinese into words.
The libraries are used in quite a number of commercial, academic and government institutions. You should have a look at it if your research area is on linguistics or semantic analysis.