RSS Feed for This PostCurrent Article

Load DMOZ RDF Structure and Content RDF

The DMOZ Open Directory Project is the largest human edited directory of web. As part of my research area, I need to load the structure and content RDF into MySQL database.

At first I was trying to use Jena to parse the RDF. However, the DMOZ RDF files are not conforming to the standard, and a lot of parsing errors are generated.

I tried to convert the structure and content RDF files. At the end, I decided to look for other tools to achieve what I wanted.

The following are the tools that I used.

DMOZ Data Importer
This tool imports the RDF data from Open Directory Project (ODP) (http://rdf.dmoz.org) straight into a Microsoft SQL Server 2005 database.

Right now only the structure file is supported, and for my own use, I modified the DAL to load into MySQL.

MySQL Dmoz RDF Parser - Convert RDF to MySQL

This is a Perl script that can be used to load the content RDF into a MySQL database, perfectly what I wanted. It will first convert the structure RDF into a pipe delimited file, which will then be imported into MySQL.

I am going to use some of the categories in ODP as training data. If anyone of you have similiar experience using ODP I would really appreciate that you can share your experience.


Trackback URL


RSS Feed for This Post1 Comment(s)

  1. xxx | May 16, 2008 | Reply

    There is a ready MySQL dump of DMOZ rdf.
    http://www.we-globe.net/WebLab/Download/DmozRdf2MySQL.html

1 Trackback(s)

  1. From Build Domain Knowledge by Extracting Keywords from DMOZ | twit88.com | Nov 15, 2007

RSS Feed for This PostPost a Comment