Archive for November, 2007

Java – Using Multiple Configuration Files in Your Application »

Download Source I have the need to use multiple configuration files in my Java application. One is a properties file, another is a XML file. This can be done easily using Apache Commons Configuration. I created a singleton class AppConfigCactory which extends CompositeConfiguration public class AppConfigFactory extends CompositeConfiguration { private static Log log = LogFactory.getLog(AppConfigFactory.class); [...]

iBatis and Apache DBCP Connection Pooling »

Download Source I was using iBatis together with Apache DBCP. Here is the SqlConfig class for iBatis public class SqlConfig { private static SqlMapClient sqlMap = null; static { try { String resource = “sqlmap.xml”; Reader reader = Resources.getResourceAsReader(resource); sqlMap = SqlMapClientBuilder.buildSqlMapClient(reader); } catch (Exception e) { // If you get an error at this [...]

Serialize .NET Object to JSON String »

Download Source This is another variation of my previous post, Serialize Java Object to JSON String. Here I rewrite the code using Json.NET

Serialize Java Object to JSON String »

In my previous articles, I talked about storing Java object using XStream and Simple, here I am going to do it again, but this time store the Java object using JSON. There are many tools that can be used for this purpose, e.g. flexjson, JSON Tools, to name a few. You can find a complete [...]

Solvent – Firefox Extension for Screen Scraping and XQuery Generator »

One thing I like about Firefox is the abundance of extensions out there which can do almost anything that you can think of. Solvent is a Firefox extension that helps you write screen scrapers for Piggy Bank. Piggy Bank is a Firefox extension that turns your browser into a mashup platform, by allowing you to [...]

Open Source Database Benchmark »

This is the database benchmark that I stumbled upon when I was searching for similiar information. PolePosition is a benchmark test suite to compare database engines and object-relational mapping technology. It has done the benchmarking for the followings db4o – the open source object database for Java and .NET Hibernate – relational persistence for idiomatic [...]

Java – Open Source Social Networking Applications »

Here is the link to a list of open source social networking applications in Java. This is an intesting area that I am currently looking at.

Build Domain Knowledge by Extracting Keywords from DMOZ »

Download Source This is a experiment I am currently doing, extracting keywords from categories in DMOZ to see how accurate it is to be used for web page categorization. From my previous post, I load the DMOZ categories into a database. The Perl script also generates a pipe delimited file for me, as show below [...]

Open Source Digg Clone »

I was searching around for open source digg clones. There are many around but mostly are not as good as what I expected. Here are the two that I am quite satisfied with after playing with a number of the open source digg clones. Pligg Most of you should know Pligg. It is a Web [...]

Integrate Maven with Version Control System »

Download In one of my projects I am using CVS. No SubVersion and Continuum yet, maybe in other new projects I will use them. One problem I found is that there are no clear example on how to use the SCM plugin, Maven profiles, etc for a multiple modules project. Here I briefly describe how [...]

Apache Commons Math – Mathematics and Statistics Components »

Commons Math is a library of lightweight, self-contained mathematics and statistics components addressing the most common problems not available in the Java programming language or Commons Lang. As quoted from the website, Apache Commons Math is made up of a small set of math/stat utilities addressing some of the programming problems like the ones in [...]

Java – International Components for Unicode »

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software. As quoted from the website, some features of ICU Code Page Conversion: Convert text data to [...]

Java – Language Identification in Web Page »

Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents. One fundamental problem of text categorization is the identification of the language, which can be resolved by using N-Gram based text categorization. An ngram is a (short) sequence of atoms like bytes, characters, words or whatsoever. NGramJ [...]

Java – Automatic Charset Detection of Web Page »

jchardet is a java port of the source from mozilla’s automatic charset detection algorithm. It is the library that I used to detect the charset of webpages. How does browser guess the charset of web pages ? The way browsers handle this problem is to look in to the data byte-by-byte and try to guess [...]

Generate RDF Meta Data from Web Page »

Download Source Code This is the code I used to extract meta data from web pages (keywords, descriptions, sentences, hyperlinks, images, etc) into RDF files. I used the WebCAT libaries. However, I fixed some bugs in the code as it sometimes cannot detect the language correctly or throw exceptions. E.g. for this blog, the RDF [...]

LingPipe – Java Libraries for Linguistic Analysis of Human Language »

This is one of the Java libraries that I tried out and found quite useful for linguistic analysis of human language. LingPipe is a suite of Java libraries for the linguistic analysis of human language. The features as quoted from the website. track mentions of entities (e.g. people or proteins); link entity mentions to database [...]

Load DMOZ RDF Structure and Content RDF »

The DMOZ Open Directory Project is the largest human edited directory of web. As part of my research area, I need to load the structure and content RDF into MySQL database. At first I was trying to use Jena to parse the RDF. However, the DMOZ RDF files are not conforming to the standard, and [...]

Weka – A Java Machine Learning Tool That I Used »

Anyone working on AI will definitely know that how difficult it is to make computers do the common things that human beings are good at. Machine learning is one of the fields in computer science that I am always interested in. While experimenting with various machine learning algorithms, Weka is my choice of tool. As [...]