Current Article

Java: Smart and Simple Web Crawler

By admin on Dec 23, 2008 in Java, open source

Smart and Simple Web Crawler

Smart and easy framework thats crawls a web site
Integrated Lucene support
It’s simple to integrate the framework in own applications
The crawler can start from one or from a list of links
Two crawling models available:
- Max Iterations: Crawls a web site through a limited number of links: Fast model with a small memory footprint and cpu usage.
- Max Depth: A simple graph model parser without recording in and outcoming links. Fast as the max interations model.
Accept filter interface to limit the links to be crawled
Core accept filters available: ServerFilter, BeginningPathFilter and RegularExpressionFilter
Combining the accept filters with AND, OR and NOT possible
Plugable http connection libraries HttpClient (default) and HTMLParser (optional)
Own listeners can be added in the parsing process
The framework is not a GUI based tool to mirror a website and browse the site offline!

Trackback URL

36 Comment(s)

Ragini | Aug 6, 2010 | Reply

This is very helpful stuff !!!!
Ragini | Aug 6, 2010 | Reply

good one.
Ragini | Aug 6, 2010 | Reply

good.
Ragini | Aug 9, 2010 | Reply

given Information is very useful…
someone | Aug 9, 2010 | Reply

ya..true..
someone | Aug 9, 2010 | Reply

very nice..
Peter | Aug 10, 2010 | Reply

Thanks for your nice article!
Michel | Aug 11, 2010 | Reply

Very good article..really !!!
Henry | Aug 11, 2010 | Reply

I think more examples should have been given here..Otherwise it is very good…
Rakesh | Aug 12, 2010 | Reply

Really niceeeee…
Raima | Aug 12, 2010 | Reply

Good article
Sangeeta | Aug 12, 2010 | Reply

Sangeeta like this……:)
Nirogi | Aug 12, 2010 | Reply

Nirogi loves this article…
vatsal | Aug 12, 2010 | Reply

this is really informative…
vatsal | Aug 13, 2010 | Reply

this is really informative
Shurvir | Aug 13, 2010 | Reply

good.
shashi | Aug 13, 2010 | Reply

It really hepled me a lot !!
Petra | Aug 13, 2010 | Reply

Thanks again!
Johnny | Aug 13, 2010 | Reply

Thanks man, it really helped!
Rani | Aug 13, 2010 | Reply

It helped..
Ajay | Aug 16, 2010 | Reply

really really nice..
vijay | Aug 16, 2010 | Reply

really really good.
Ramesh | Aug 16, 2010 | Reply

good……
Hiren | Aug 16, 2010 | Reply

hey this is really nice…
champa | Aug 16, 2010 | Reply

really good
Kishan | Aug 16, 2010 | Reply

Very nice article .!!!!!!!!!…..!!!!.
Kishan | Aug 16, 2010 | Reply

Very nice article .!!!!!!!!!…..!!!!.@
Sanket | Aug 16, 2010 | Reply

Osome..
Simon | Aug 16, 2010 | Reply

Good..
Krishna | Aug 17, 2010 | Reply

I found it really nice….
Krishna | Aug 17, 2010 | Reply

I found it really nice……..
Krishna | Aug 17, 2010 | Reply

I found it really nice………
Krishna | Aug 17, 2010 | Reply

I found it really nice……….!
Krishna | Sep 6, 2010 | Reply

I found it really nice………..!
Krishna | Sep 6, 2010 | Reply

I found it really nice…………!
Krishna | Sep 6, 2010 | Reply

I found it really niceeeeee………….!

twit88.com

Current Article

Java: Smart and Simple Web Crawler

36 Comment(s)

Post a Comment

Subscribe

Recent Posts

Categories

Archives

twit88.com

Current Article

Java: Smart and Simple Web Crawler

Related Posts

36 Comment(s)

Post a Comment

Subscribe

Recent Posts

Categories

Popular Posts

Archives