RSS Feed for This PostCurrent Article

Java: Smart and Simple Web Crawler

Smart and Simple Web Crawler

  • Smart and easy framework thats crawls a web site
  • Integrated Lucene support
  • It’s simple to integrate the framework in own applications
  • The crawler can start from one or from a list of links
  • Two crawling models available:
    • Max Iterations: Crawls a web site through a limited number of links: Fast model with a small memory footprint and cpu usage.
    • Max Depth: A simple graph model parser without recording in and outcoming links. Fast as the max interations model.
  • Accept filter interface to limit the links to be crawled
  • Core accept filters available: ServerFilter, BeginningPathFilter and RegularExpressionFilter
  • Combining the accept filters with AND, OR and NOT possible
  • Plugable http connection libraries HttpClient (default) and HTMLParser (optional)
  • Own listeners can be added in the parsing process
  • The framework is not a GUI based tool to mirror a website and browse the site offline!


Trackback URL


RSS Feed for This Post36 Comment(s)

  1. Ragini | Aug 6, 2010 | Reply

    This is very helpful stuff !!!!

  2. Ragini | Aug 6, 2010 | Reply

    good one.

  3. Ragini | Aug 6, 2010 | Reply

    good.

  4. Ragini | Aug 9, 2010 | Reply

    given Information is very useful…

  5. someone | Aug 9, 2010 | Reply

    ya..true..

  6. someone | Aug 9, 2010 | Reply

    very nice..

  7. Peter | Aug 10, 2010 | Reply

    Thanks for your nice article! :)

  8. Michel | Aug 11, 2010 | Reply

    Very good article..really !!!

  9. Henry | Aug 11, 2010 | Reply

    I think more examples should have been given here..Otherwise it is very good…

  10. Rakesh | Aug 12, 2010 | Reply

    Really niceeeee…

  11. Raima | Aug 12, 2010 | Reply

    Good article :)

  12. Sangeeta | Aug 12, 2010 | Reply

    Sangeeta like this……:)

  13. Nirogi | Aug 12, 2010 | Reply

    Nirogi loves this article…

  14. vatsal | Aug 12, 2010 | Reply

    this is really informative…

  15. vatsal | Aug 13, 2010 | Reply

    this is really informative

  16. Shurvir | Aug 13, 2010 | Reply

    good.

  17. shashi | Aug 13, 2010 | Reply

    It really hepled me a lot !!

  18. Petra | Aug 13, 2010 | Reply

    Thanks again!

  19. Johnny | Aug 13, 2010 | Reply

    Thanks man, it really helped!

  20. Rani | Aug 13, 2010 | Reply

    It helped..

  21. Ajay | Aug 16, 2010 | Reply

    really really nice..

  22. vijay | Aug 16, 2010 | Reply

    really really good.

  23. Ramesh | Aug 16, 2010 | Reply

    good……

  24. Hiren | Aug 16, 2010 | Reply

    hey this is really nice…

  25. champa | Aug 16, 2010 | Reply

    really good

  26. Kishan | Aug 16, 2010 | Reply

    Very nice article .!!!!!!!!!…..!!!!.

  27. Kishan | Aug 16, 2010 | Reply

    Very nice article .!!!!!!!!!…..!!!!.@

  28. Sanket | Aug 16, 2010 | Reply

    Osome..

  29. Simon | Aug 16, 2010 | Reply

    Good..

  30. Krishna | Aug 17, 2010 | Reply

    I found it really nice….

  31. Krishna | Aug 17, 2010 | Reply

    I found it really nice……..

  32. Krishna | Aug 17, 2010 | Reply

    I found it really nice………

  33. Krishna | Aug 17, 2010 | Reply

    I found it really nice……….!

  34. Krishna | Sep 6, 2010 | Reply

    I found it really nice………..!

  35. Krishna | Sep 6, 2010 | Reply

    I found it really nice…………!

  36. Krishna | Sep 6, 2010 | Reply

    I found it really niceeeeee………….!

RSS Feed for This PostPost a Comment

CAPTCHA Image
Refresh Image
*