Load DMOZ RDF Structure and Content RDF

The DMOZ Open Directory Project is the largest human edited directory of web. As part of my research area, I need to load the structure and content RDF into MySQL database.

At first I was trying to use Jena to parse the RDF. However, the DMOZ RDF files are not conforming to the standard, and a lot of parsing errors are generated.

I tried to convert the structure and content RDF files. At the end, I decided to look for other tools to achieve what I wanted.

The following are the tools that I used.

DMOZ Data Importer
This tool imports the RDF data from Open Directory Project (ODP) ( straight into a Microsoft SQL Server 2005 database.

Right now only the structure file is supported, and for my own use, I modified the DAL to load into MySQL.

MySQL Dmoz RDF Parser – Convert RDF to MySQL

This is a Perl script that can be used to load the content RDF into a MySQL database, perfectly what I wanted. It will first convert the structure RDF into a pipe delimited file, which will then be imported into MySQL.

I am going to use some of the categories in ODP as training data. If anyone of you have similiar experience using ODP I would really appreciate that you can share your experience.

  1. xxx | May 16, 2008 | Reply

    There is a ready MySQL dump of DMOZ rdf.

