<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>twit88.com &#187; research</title>
	<link>http://twit88.com/blog</link>
	<description>Good judgement comes from experience, and experience comes from bad judgement.</description>
	<pubDate>Wed, 19 Nov 2008 06:30:21 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
	<language>en</language>
			<item>
		<title>Open Source Distributed Filesystem</title>
		<link>http://twit88.com/blog/2008/09/21/open-source-distributed-filesystem/</link>
		<comments>http://twit88.com/blog/2008/09/21/open-source-distributed-filesystem/#comments</comments>
		<pubDate>Sun, 21 Sep 2008 15:43:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[open source]]></category>

		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://twit88.com/blog/2008/09/21/open-source-distributed-filesystem/</guid>
		<description><![CDATA[MogileFS is an open source distributed filesystem.
As quoted from its website, its properties and features include:

Application level &#8212; no special kernel modules required.
No single point of failure &#8212; all three components of a MogileFS setup (storage nodes, trackers, and the tracker&#8217;s database(s)) can be run on multiple machines, so there&#8217;s no single point of failure. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.danga.com/mogilefs/">MogileFS</a> is an open source distributed filesystem.
<p>As quoted from its website, its properties and features include:
<ul>
<li><b>Application level</b> &#8212; no special kernel modules required.
<li><b>No single point of failure</b> &#8212; all three components of a MogileFS setup (storage nodes, trackers, and the tracker&#8217;s database(s)) can be run on multiple machines, so there&#8217;s no single point of failure. (you can run trackers on the same machines as storage nodes, too, so you don&#8217;t need 4 machines&#8230;) A minimum of 2 machines is recommended.
<li><b>Automatic file replication</b> &#8212; files, based on their &#8220;class&#8221;, are automatically replicated between enough different storage nodes as to satisfy the minimum replica count as requested by their class. For instance, for a photo hosting site you can make original JPEGs have a minimum replica count of 3, but thumbnails and scaled versions only have a replica count of 1 or 2. If you lose the only copy of a thumbnail, the application can just rebuild it. In this way, MogileFS (without RAID) can save money on disks that would otherwise be storing multiple copies of data unnecessarily.
<li><b>&#8220;Better than RAID&#8221;</b> &#8212; in a non-SAN RAID setup, the disks are redundant, but the host isn&#8217;t. If you lose the entire machine, the files are inaccessible. MogileFS replicates the files between devices which are on different hosts, so files are always available.
<li><b>Flat Namespace</b> &#8212; Files are identified by named keys in a flat, global namespace. You can create as many namespaces as you&#8217;d like, so multiple applications with potentially conflicting keys can run on the same MogileFS installation.
<li><b>Shared-Nothing</b> &#8212; MogileFS doesn&#8217;t depend on a pricey SAN with shared disks. Every machine maintains its own local disks.
<li><b>No RAID required</b> &#8212; Local disks on MogileFS storage nodes can be in a RAID, or not. It&#8217;s cheaper not to, as RAID doesn&#8217;t buy you any safety that MogileFS doesn&#8217;t already provide.
<li><b>Local filesystem agnostic</b> &#8212; Local disks on MogileFS storage nodes can be formatted with your filesystem of choice (ext3, XFS, etc..). MogileFS does its own internal directory hashing so it doesn&#8217;t hit filesystem limits such as &#8220;max files per directory&#8221; or &#8220;max directories per directory&#8221;. Use what you&#8217;re comfortable with.</li>
</ul>
<p>Digg is using MogileFS. An interesting article can be found at <a href="http://blog.digg.com/?p=168">How Digg Works</a></p>
]]></content:encoded>
			<wfw:commentRss>http://twit88.com/blog/2008/09/21/open-source-distributed-filesystem/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Apache UIMA: Unstructured Information Management Architecture</title>
		<link>http://twit88.com/blog/2008/08/06/apache-uima-unstructured-information-management-architecture/</link>
		<comments>http://twit88.com/blog/2008/08/06/apache-uima-unstructured-information-management-architecture/#comments</comments>
		<pubDate>Wed, 06 Aug 2008 07:01:07 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Java]]></category>

		<category><![CDATA[open source]]></category>

		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://twit88.com/blog/2008/08/06/apache-uima-unstructured-information-management-architecture/</guid>
		<description><![CDATA[UIMA is a framework and SDK for developing software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user.
As quoted from the website, an example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://incubator.apache.org/uima/">UIMA</a> is a framework and SDK for developing software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user.</p>
<p>As quoted from the website, an example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example &#8220;language identification&#8221; -&gt; &#8220;language specific segmentation&#8221; -&gt; &#8220;sentence boundary detection&#8221; -&gt; &#8220;entity detection (person/place names etc.)&#8221;.</p>
<p>UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.</p>
]]></content:encoded>
			<wfw:commentRss>http://twit88.com/blog/2008/08/06/apache-uima-unstructured-information-management-architecture/feed/</wfw:commentRss>
		</item>
		<item>
		<title>SEDA: An Architecture for Highly Concurrent Server Applications</title>
		<link>http://twit88.com/blog/2008/07/19/seda-an-architecture-for-highly-concurrent-server-applications/</link>
		<comments>http://twit88.com/blog/2008/07/19/seda-an-architecture-for-highly-concurrent-server-applications/#comments</comments>
		<pubDate>Sat, 19 Jul 2008 15:53:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[open source]]></category>

		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://twit88.com/blog/2008/07/19/seda-an-architecture-for-highly-concurrent-server-applications/</guid>
		<description><![CDATA[As quoted from the website, SEDA is an acronym for staged event-driven architecture, and decomposes a complex, event-driven application into a set of stages connected by queues. This design avoids the high overhead associated with thread-based concurrency models, and decouples event and thread scheduling from application logic. By performing admission control on each event queue, [...]]]></description>
			<content:encoded><![CDATA[<p>As quoted from the website, <a href="http://www.eecs.harvard.edu/~mdw/proj/seda/">SEDA</a> is an acronym for <em>staged event-driven architecture</em>, and decomposes a complex, event-driven application into a set of <em>stages</em> connected by <em>queues</em>. This design avoids the high overhead associated with thread-based concurrency models, and decouples event and thread scheduling from application logic. By performing admission control on each event queue, the service can be well-conditioned to load, preventing resources from being overcommitted when demand exceeds service capacity. </p>
<p>SEDA employs dynamic control to automatically tune runtime parameters (such as the scheduling parameters of each stage), as well as to manage load, for example, by performing adaptive load shedding. Decomposing services into a set of stages also enables modularity and code reuse, as well as the development of debugging tools for complex event-driven applications.</p>
<p>SEDA is used in a number of open source and commercial projects.</p>
<p>Read the <a href="http://www.eecs.harvard.edu/~mdw/proj/seda/#papers">research papers</a> if you interested in developing high concurrent systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://twit88.com/blog/2008/07/19/seda-an-architecture-for-highly-concurrent-server-applications/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Open Source Grid Computing</title>
		<link>http://twit88.com/blog/2008/05/19/open-source-grid-computing/</link>
		<comments>http://twit88.com/blog/2008/05/19/open-source-grid-computing/#comments</comments>
		<pubDate>Mon, 19 May 2008 06:32:58 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[interesting]]></category>

		<category><![CDATA[open source]]></category>

		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://twit88.com/blog/2008/05/19/open-source-grid-computing/</guid>
		<description><![CDATA[Here are some open source grid computing software that are quite interesting.
 
GridGain is the open source grid computing software for Java. It is dual-licensed under LGPL and Apache 2.0 licenses and is built on open source software foundation
&#160;
 
BOINC is a software platform for volunteer computing and desktop grid computing.&#160; BOINC is designed to [...]]]></description>
			<content:encoded><![CDATA[<p>Here are some open source grid computing software that are quite interesting.</p>
<p><a href="http://twit88.com/blog/wp-content/uploads/2008/05/windowslivewritergridcomputing-d58bgridgrain-2.png"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="121" alt="gridgrain" src="http://twit88.com/blog/wp-content/uploads/2008/05/windowslivewritergridcomputing-d58bgridgrain-thumb.png" width="454" border="0"></a> </p>
<p><a href="http://www.gridgain.com/">GridGain</a> is the open source grid computing software for Java. It is dual-licensed under LGPL and Apache 2.0 licenses and is built on open source software foundation</p>
<p>&nbsp;</p>
<p><a href="http://twit88.com/blog/wp-content/uploads/2008/05/windowslivewritergridcomputing-d58bboinc-2.gif"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="77" alt="boinc" src="http://twit88.com/blog/wp-content/uploads/2008/05/windowslivewritergridcomputing-d58bboinc-thumb.gif" width="168" border="0"></a> </p>
<p><a href="http://boinc.berkeley.edu/">BOINC</a> is a software platform for <a href="http://boinc.berkeley.edu/trac/wiki/VolunteerComputing">volunteer computing</a> and <a href="http://boinc.berkeley.edu/trac/wiki/DesktopGrid">desktop grid computing</a>.&nbsp; BOINC is designed to support applications that have large computation requirements, storage requirements, or both. The main requirement of the application is that it be divisible into a large number (thousands or millions) of jobs that can be done independently.</p>
<p>BOINC is used by <a href="mailto:SETI@home">SETI@home</a></p>
<p>&nbsp;</p>
<p><a href="http://twit88.com/blog/wp-content/uploads/2008/05/windowslivewritergridcomputing-d58bglobustoolkit-2.gif"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="80" alt="globustoolkit" src="http://twit88.com/blog/wp-content/uploads/2008/05/windowslivewritergridcomputing-d58bglobustoolkit-thumb.gif" width="164" border="0"></a> </p>
<p><a href="http://www.globus.org/">Globus Toolkit</a> is an open source software toolkit used for building Grid systems and applications. It is being developed by the Globus Alliance and many others all over the world.</p>
<p><a href="http://ngrid.sourceforge.net/">NGrid</a> is an open source (<a href="http://www.gnu.org/licences/lgpl.html"><acronym>LGPL</acronym></a>) grid computing framework written in C#. NGrid aims to be platform independent via the <a href="http://www.go-mono.org">Mono</a> project.</p>
<p>Other useful references</p>
<ul>
<li><a href="http://www.ggf.org/">Open Grid Forum</a>
<li><a href="http://www.globus.org/ogsa/">Open Grid Service Architecture</a> represents an evolution towards a Grid system architecture based on Web services concepts and technologies.
<li><a href="http://www.globus.org/grid_software/">Software components for grid systems and applications.</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://twit88.com/blog/2008/05/19/open-source-grid-computing/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
