<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Rational 1000: A Version Control War Story</title>
	<atom:link href="http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/</link>
	<description>A blog for odd things and odd thoughts.</description>
	<lastBuildDate>Wed, 01 Feb 2012 22:21:16 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: OddThinking &#187; Comparing Strings: An Analysis of Diff Algorithms</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/comment-page-1/#comment-2599</link>
		<dc:creator>OddThinking &#187; Comparing Strings: An Analysis of Diff Algorithms</dc:creator>
		<pubDate>Sun, 15 Jan 2006 13:45:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/#comment-2599</guid>
		<description>I have posted an excerpt of my uni assignment that described several variants of the &lt;code&gt;diff&lt;/code&gt; algorithm</description>
		<content:encoded><![CDATA[<p>I have posted an excerpt of my uni assignment that described several variants of the <code>diff</code> algorithm</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aristotle Pagaltzis</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/comment-page-1/#comment-2593</link>
		<dc:creator>Aristotle Pagaltzis</dc:creator>
		<pubDate>Sun, 15 Jan 2006 05:28:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/#comment-2593</guid>
		<description>Ah. I didn’t have numbers on the size of your input; that indeed sounds miniscule.

Maybe it did something like what the user search code in the control panel of Ultimate Bulletin Board 5.x did: it called a function to get a list of usernames, then it went over the list a loop in which it called another function to get the details about the user in question. Which would be sane enough if the function that returned account details didn’t do its job by opening the user accounts database flatfile, reading and parsing it all and storing the entire thing in memory, only to return a single record and throw away all the other data.

This is how you turn any problem of linear complexity into an algorithm of quadratic complexity.

&lt;i&gt;(And how you inflate the runtime of a search function on a system with 3,000 users from 7 seconds to 3 minutes.&lt;/i&gt;

&lt;i&gt;These are actual times measured after and before my refactoring. The additional half magnitude not accounted for by the linear vs. quadratic complexities is due to the fact that, unsurprisingly, the old code also did a lot of useless work by not storing parsed data in an appropriate data structure, instead painstakingly gluing everything back into strings at the end of a step and only to unravel them from scratch during the next step.&lt;/i&gt;

&lt;i&gt;Needless to mention, besides orders of magnitude faster, my code was also much shorter (by about 3× if memory serves) and at the same time it was actually readable because the (trivial) intent was explicit.)&lt;/i&gt;

Maybe someone at Rational wrote the R1000’s diff algorithm in a similarly “ingenious” way.</description>
		<content:encoded><![CDATA[<p>Ah. I didn’t have numbers on the size of your input; that indeed sounds miniscule.</p>
<p>Maybe it did something like what the user search code in the control panel of Ultimate Bulletin Board 5.x did: it called a function to get a list of usernames, then it went over the list a loop in which it called another function to get the details about the user in question. Which would be sane enough if the function that returned account details didn’t do its job by opening the user accounts database flatfile, reading and parsing it all and storing the entire thing in memory, only to return a single record and throw away all the other data.</p>
<p>This is how you turn any problem of linear complexity into an algorithm of quadratic complexity.</p>
<p><i>(And how you inflate the runtime of a search function on a system with 3,000 users from 7 seconds to 3 minutes.</i></p>
<p><i>These are actual times measured after and before my refactoring. The additional half magnitude not accounted for by the linear vs. quadratic complexities is due to the fact that, unsurprisingly, the old code also did a lot of useless work by not storing parsed data in an appropriate data structure, instead painstakingly gluing everything back into strings at the end of a step and only to unravel them from scratch during the next step.</i></p>
<p><i>Needless to mention, besides orders of magnitude faster, my code was also much shorter (by about 3× if memory serves) and at the same time it was actually readable because the (trivial) intent was explicit.)</i></p>
<p>Maybe someone at Rational wrote the R1000’s diff algorithm in a similarly “ingenious” way.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/comment-page-1/#comment-2591</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Sun, 15 Jan 2006 04:31:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/#comment-2591</guid>
		<description>I agree it was a naïve implementation, but it was &lt;em&gt;worse&lt;/em&gt; than the canonical naïve implementation.

The canonical naïve solution is &lt;em&gt;O(sizeOfFile1 * sizeOfFile2)&lt;/em&gt; and, perhaps surprisingly, isn&#039;t related to the number of differences.

The small file would have been 500 lines * 3 characters = approx 1500 bytes.

The large file would have been 20,000 lines * approximately 15 characters = 300,000 bytes.

Sqrt(1500 bytes * 300000 bytes) = 21KB

So it should have taken equivalent time to comparing two identical 21 KB files to each other - hardly &gt;16 hours elapsed-time worth (which included more than 12 hours of CPU time).

Reference: My Uni analysis that I might post here, if I can convert the LaTeX to HTML.</description>
		<content:encoded><![CDATA[<p>I agree it was a naïve implementation, but it was <em>worse</em> than the canonical naïve implementation.</p>
<p>The canonical naïve solution is <em>O(sizeOfFile1 * sizeOfFile2)</em> and, perhaps surprisingly, isn&#8217;t related to the number of differences.</p>
<p>The small file would have been 500 lines * 3 characters = approx 1500 bytes.</p>
<p>The large file would have been 20,000 lines * approximately 15 characters = 300,000 bytes.</p>
<p>Sqrt(1500 bytes * 300000 bytes) = 21KB</p>
<p>So it should have taken equivalent time to comparing two identical 21 KB files to each other &#8211; hardly >16 hours elapsed-time worth (which included more than 12 hours of CPU time).</p>
<p>Reference: My Uni analysis that I might post here, if I can convert the LaTeX to HTML.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aristotle Pagaltzis</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/comment-page-1/#comment-2581</link>
		<dc:creator>Aristotle Pagaltzis</dc:creator>
		<pubDate>Sat, 14 Jan 2006 22:01:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/14/rational-1000-version-control-war-story/#comment-2581</guid>
		<description>Maybe it was simply a naïve implementation. After all, the most straightforward approach that would occur to someone who doesn’t know any better has polynomial complexity – as is the case with most any algorithm that involves comparisons, now that I think about. (String search and sorting are other examples, off the top of my head.)</description>
		<content:encoded><![CDATA[<p>Maybe it was simply a naïve implementation. After all, the most straightforward approach that would occur to someone who doesn’t know any better has polynomial complexity – as is the case with most any algorithm that involves comparisons, now that I think about. (String search and sorting are other examples, off the top of my head.)</p>
]]></content:encoded>
	</item>
</channel>
</rss>

