<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Java - Writing a Web Page Scraper or Web Data Extraction Tool</title>
	<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/</link>
	<description>New SMS Library at http://twit88.com/platform/projects/show/messagingtoolkit !</description>
	<pubDate>Fri, 19 Mar 2010 05:17:30 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: janapati siva prasad rao</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-152042</link>
		<dc:creator>janapati siva prasad rao</dc:creator>
		<pubDate>Tue, 16 Feb 2010 11:53:16 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-152042</guid>
		<description>Hi,
I am trying to use web harvest tool to extract data from web sites.I have gone through the examples given in the web harvest tool.I have tried to extract the data from the site,by giving the url.It is working fine.Now, i am trying to implement the same approach,to extract the entire site(just like canon example).Can you please give any inputs to write configuration file for this?

One more thing , i am not understanding how to write the configuration file.Please explain how to write the configuration file.

Regards,
Siva</description>
		<content:encoded><![CDATA[<p>Hi,<br />
I am trying to use web harvest tool to extract data from web sites.I have gone through the examples given in the web harvest tool.I have tried to extract the data from the site,by giving the url.It is working fine.Now, i am trying to implement the same approach,to extract the entire site(just like canon example).Can you please give any inputs to write configuration file for this?</p>
<p>One more thing , i am not understanding how to write the configuration file.Please explain how to write the configuration file.</p>
<p>Regards,<br />
Siva</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fuller</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-122622</link>
		<dc:creator>Fuller</dc:creator>
		<pubDate>Wed, 02 Dec 2009 01:10:29 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-122622</guid>
		<description>I have authored a tool, i.e. MetaSeeker, to calculate the web data extraction instructions automatically after semantic annotating over the sample page by the operator. Free download: http://www.gooseeker.com/en/node/download/front</description>
		<content:encoded><![CDATA[<p>I have authored a tool, i.e. MetaSeeker, to calculate the web data extraction instructions automatically after semantic annotating over the sample page by the operator. Free download: <a href="http://www.gooseeker.com/en/node/download/front" rel="nofollow">http://www.gooseeker.com/en/node/download/front</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Heriberto Janosch González</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-87424</link>
		<dc:creator>Heriberto Janosch González</dc:creator>
		<pubDate>Tue, 16 Jun 2009 15:35:46 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-87424</guid>
		<description>Hello,

Can you help me with Web Harvest?

I need to load this page:

http://contrataciondelestado.es/wps/wcm/connect/PLACE_es/Site/area/docAccCmpnt?srv=cmpnt&#38;cmpntname=GetDocumentsById&#38;source=library&#38;DocumentIdParam=1b082c004e7f2b08b5f4ffbe46495314

When you place it in a browser you will see that it is a Xml document.

But when you put it in a &#60;http url=" ... instruction form Web Harvest, it loads something like


&#60;meta http-equiv="refresh" content="0;url=&apos;/wps/wcm/connect/? ...

That is because (I believe) the meta refresh loads another page 0 seconds after loading the first one ...

How can I solve this problem with Web Harvest?

Thanks in advance for your kind attention!

Please, if you have an answer write to: heribertojanosch@yahoo.com</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>Can you help me with Web Harvest?</p>
<p>I need to load this page:</p>
<p><a href="http://contrataciondelestado.es/wps/wcm/connect/PLACE_es/Site/area/docAccCmpnt?srv=cmpnt&amp;cmpntname=GetDocumentsById&amp;source=library&amp;DocumentIdParam=1b082c004e7f2b08b5f4ffbe46495314" rel="nofollow">http://contrataciondelestado.es/wps/wcm/connect/PLACE_es/Site/area/docAccCmpnt?srv=cmpnt&amp;cmpntname=GetDocumentsById&amp;source=library&amp;DocumentIdParam=1b082c004e7f2b08b5f4ffbe46495314</a></p>
<p>When you place it in a browser you will see that it is a Xml document.</p>
<p>But when you put it in a &lt;http url=&#8221; &#8230; instruction form Web Harvest, it loads something like</p>
<p>&lt;meta http-equiv=&#8221;refresh&#8221; content=&#8221;0;url=&apos;/wps/wcm/connect/? &#8230;</p>
<p>That is because (I believe) the meta refresh loads another page 0 seconds after loading the first one &#8230;</p>
<p>How can I solve this problem with Web Harvest?</p>
<p>Thanks in advance for your kind attention!</p>
<p>Please, if you have an answer write to: <a href="mailto:heribertojanosch@yahoo.com">heribertojanosch@yahoo.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sam</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-86006</link>
		<dc:creator>sam</dc:creator>
		<pubDate>Wed, 10 Jun 2009 12:49:51 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-86006</guid>
		<description>can you tell me if there's any function in PL/SQL for page scraping??</description>
		<content:encoded><![CDATA[<p>can you tell me if there&#8217;s any function in PL/SQL for page scraping??</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nils-kaiser.de &#187; Time to crawl back! Download Google Groups using a crawler</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-47669</link>
		<dc:creator>nils-kaiser.de &#187; Time to crawl back! Download Google Groups using a crawler</dc:creator>
		<pubDate>Sat, 14 Feb 2009 00:48:49 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-47669</guid>
		<description>[...] Hope this helps! Feel free to change the script and to notify me of any useful addition. To start changing the script, I recommend to have a look at the user manual and the examples. Also have a look at some other uses here and here. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Hope this helps! Feel free to change the script and to notify me of any useful addition. To start changing the script, I recommend to have a look at the user manual and the examples. Also have a look at some other uses here and here. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Expertaya &#187; HtmlUnit as Java Screen Scraping Library</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-39231</link>
		<dc:creator>Expertaya &#187; HtmlUnit as Java Screen Scraping Library</dc:creator>
		<pubDate>Fri, 23 Jan 2009 10:38:18 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-39231</guid>
		<description>[...] neither of them is as good as this library. For example, writing a screen scraper with Web Harvest is an easy task, but badly formatted pages cause xml parser to break and this happened to me a lot of times. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] neither of them is as good as this library. For example, writing a screen scraper with Web Harvest is an easy task, but badly formatted pages cause xml parser to break and this happened to me a lot of times. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Radu</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-8460</link>
		<dc:creator>Radu</dc:creator>
		<pubDate>Sat, 17 May 2008 17:15:07 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-8460</guid>
		<description>Hi, if there were more than 1 article, how would you show them all?
Could you show me how to loop the query?
Thanks!</description>
		<content:encoded><![CDATA[<p>Hi, if there were more than 1 article, how would you show them all?<br />
Could you show me how to loop the query?<br />
Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: santosh</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-6868</link>
		<dc:creator>santosh</dc:creator>
		<pubDate>Sun, 27 Apr 2008 22:06:29 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-6868</guid>
		<description>i tried to use the java code but it says cannot find the imported files.Can you please guide me what am i missing here?</description>
		<content:encoded><![CDATA[<p>i tried to use the java code but it says cannot find the imported files.Can you please guide me what am i missing here?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Website Scraping for Dummies &#124; The BookmarkMoney Blog</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-6181</link>
		<dc:creator>Website Scraping for Dummies &#124; The BookmarkMoney Blog</dc:creator>
		<pubDate>Sat, 19 Apr 2008 21:51:44 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-6181</guid>
		<description>[...] The Twit88 blog has two excellent tutorials on using Java/Web Harvest to extract data from websites. Web Scraping using Web Harvest, and Java - Writing a Web Page Scraper or Web Data Extraction Tool. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] The Twit88 blog has two excellent tutorials on using Java/Web Harvest to extract data from websites. Web Scraping using Web Harvest, and Java - Writing a Web Page Scraper or Web Data Extraction Tool. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruben Zevallos Jr.</title>
		<link>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-3451</link>
		<dc:creator>Ruben Zevallos Jr.</dc:creator>
		<pubDate>Fri, 15 Feb 2008 13:38:18 +0000</pubDate>
		<guid>http://twit88.com/blog/2008/01/06/java-writing-a-web-page-scraper-or-web-data-extraction-tool/#comment-3451</guid>
		<description>Thank you for your article... it open my mind for some other things that I doing.

Best</description>
		<content:encoded><![CDATA[<p>Thank you for your article&#8230; it open my mind for some other things that I doing.</p>
<p>Best</p>
]]></content:encoded>
	</item>
</channel>
</rss>
