Java: Generate RDF from Web Pages
By admin on Aug 30, 2009 in Java, open source
WebCAT is an extensible tool to extract meta-data and generate RDF descriptions from existing Web documents. Implemented in Java, it provides a set of APIs (Application Programming Interfaces) that allow one to analyse text documents from the Web without having to write complicated parsers.
Among other things, WebCAT provides:
– Language and encoding detection.
– Hyperlink extraction.
– Text tokenization (words, n-grams, sentences).
– Document fingerprinting.
– Format conversion.
– Metadata extraction and normalization.
– Named Entity Extraction.
– Document classification.
The considered meta-data elements are particularly suited to the domain of automated search, making this a good tool to use in other information retrieval and extraction projects.
Elliot | Sep 15, 2009 | Reply
AlchemyAPI is another cloud-based service that is capable of generating RDF from text and/or web pages.
http://www.alchemyapi.com/api/entity/ldata.html
Also provides Linked Data support: linkages to DBpedia, OpenCyc and other online datasets.