<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Comparing Strings: An Analysis of Diff Algorithms</title>
	<atom:link href="http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/</link>
	<description>A blog for odd things and odd thoughts.</description>
	<lastBuildDate>Wed, 10 Mar 2010 17:00:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Maarten</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/comment-page-1/#comment-223014</link>
		<dc:creator>Maarten</dc:creator>
		<pubDate>Tue, 23 Feb 2010 17:41:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/#comment-223014</guid>
		<description>The &quot;Myers86&quot; reference is most likely &quot;&lt;a href=&quot;http://en.scientificcommons.org/42915130&quot; rel=&quot;nofollow&quot;&gt;An O(ND) Difference Algorithm and Its Variations&lt;/a&gt;&quot;, published in 1986 by Eugene W. Myers.</description>
		<content:encoded><![CDATA[<p>The &#8220;Myers86&#8243; reference is most likely &#8220;<a href="http://en.scientificcommons.org/42915130" rel="nofollow" class="liexternal">An O(ND) Difference Algorithm and Its Variations</a>&#8220;, published in 1986 by Eugene W. Myers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: aj</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/comment-page-1/#comment-44432</link>
		<dc:creator>aj</dc:creator>
		<pubDate>Sat, 16 Jun 2007 15:00:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/#comment-44432</guid>
		<description>Hi,

How about a minor modification to the diff heuristic such that when one file is a proper subset of the other (meaning on only deletions required), it would, without reducing the efficiency give the best diff?

Please let me know your thoughts on this.

Regards,
Arjun</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>How about a minor modification to the diff heuristic such that when one file is a proper subset of the other (meaning on only deletions required), it would, without reducing the efficiency give the best diff?</p>
<p>Please let me know your thoughts on this.</p>
<p>Regards,<br />
Arjun</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: entY8</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/comment-page-1/#comment-40925</link>
		<dc:creator>entY8</dc:creator>
		<pubDate>Wed, 23 May 2007 20:07:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/#comment-40925</guid>
		<description>Thanks a lot for this; it was the first useful site I found on this topic :-D
(btw, I found it via Google somewhere among the first few hits)</description>
		<content:encoded><![CDATA[<p>Thanks a lot for this; it was the first useful site I found on this topic <img src='http://www.somethinkodd.com/oddthinking/wp-includes/images/smilies/icon_biggrin.gif' alt=':-D' class='wp-smiley' /><br />
(btw, I found it via Google somewhere among the first few hits)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: turbo</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/comment-page-1/#comment-8885</link>
		<dc:creator>turbo</dc:creator>
		<pubDate>Sat, 01 Jul 2006 14:31:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/#comment-8885</guid>
		<description>I have been recently speaking to Dr. Gene Myers. His algorithms are very elegant and fast, however as I found out, and as you clearly realized, they do not allow for substitutions, which is one of the allowed operations of Levenshtein distance.

I have coded up a few versions of code that computes Levenshtein
distance. I wrote the really obvious one, as well as the one dimensional storage version of that. I acutally used that code for many years.

After some gentle encouragement, I searched the internet for better algorithms. I found references to Ukkonen&#039;s work. I optimized my code to only compute the diagonals needed. That sped up my code by roughly a factor of 6. I knew of his other optimization, which computes the cells of the 2d array in order of distance. I also coded that up, however I have not found an efficient way to do so. There are several operations in my code that are O(n^2), which is unfortunate. The algorithm does fill in less values in the 2d array, but it takes much too long. Perhaps I will figure out some very clever, efficient way
to fill out the distances of the array in order...

I also coded up a classic implementation of the Damerau/Levenshtein distance which also allows for transpositition.

Thanks for fixing the typo in my array where I included an extra
value...

If you have access to Ukkonen&#039;s papers, I would really like to see them. They seem to be too old to be online....

turbo</description>
		<content:encoded><![CDATA[<p>I have been recently speaking to Dr. Gene Myers. His algorithms are very elegant and fast, however as I found out, and as you clearly realized, they do not allow for substitutions, which is one of the allowed operations of Levenshtein distance.</p>
<p>I have coded up a few versions of code that computes Levenshtein<br />
distance. I wrote the really obvious one, as well as the one dimensional storage version of that. I acutally used that code for many years.</p>
<p>After some gentle encouragement, I searched the internet for better algorithms. I found references to Ukkonen&#8217;s work. I optimized my code to only compute the diagonals needed. That sped up my code by roughly a factor of 6. I knew of his other optimization, which computes the cells of the 2d array in order of distance. I also coded that up, however I have not found an efficient way to do so. There are several operations in my code that are O(n^2), which is unfortunate. The algorithm does fill in less values in the 2d array, but it takes much too long. Perhaps I will figure out some very clever, efficient way<br />
to fill out the distances of the array in order&#8230;</p>
<p>I also coded up a classic implementation of the Damerau/Levenshtein distance which also allows for transpositition.</p>
<p>Thanks for fixing the typo in my array where I included an extra<br />
value&#8230;</p>
<p>If you have access to Ukkonen&#8217;s papers, I would really like to see them. They seem to be too old to be online&#8230;.</p>
<p>turbo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/comment-page-1/#comment-8652</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Tue, 27 Jun 2006 22:08:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/#comment-8652</guid>
		<description>Thanks Turbo,

You made me very happy, because your comment was evidence that someone actually read - and understood - the report!

In your grid, it appears that the edit distance between the string &lt;code&gt;c&lt;/code&gt; and &lt;code&gt;h&lt;/code&gt; is 1.

This is only true if you permit the Replace operation (i.e. Replace &#039;c&#039; with &#039;h&#039;.) In the analysis of that section, it assumed the only basic operations were Add and Delete (i.e. Delete &#039;c&#039;. Add &#039;h&#039;.)

I agree that, if you can use Replace as one of the basic operations, then the increment going down the diagonal can either be 0 or 1 (not 2), and the rest of the analysis needs to be appropriately re-jiggered to cover that.</description>
		<content:encoded><![CDATA[<p>Thanks Turbo,</p>
<p>You made me very happy, because your comment was evidence that someone actually read &#8211; and understood &#8211; the report!</p>
<p>In your grid, it appears that the edit distance between the string <code>c</code> and <code>h</code> is 1.</p>
<p>This is only true if you permit the Replace operation (i.e. Replace &#8216;c&#8217; with &#8216;h&#8217;.) In the analysis of that section, it assumed the only basic operations were Add and Delete (i.e. Delete &#8216;c&#8217;. Add &#8216;h&#8217;.)</p>
<p>I agree that, if you can use Replace as one of the basic operations, then the increment going down the diagonal can either be 0 or 1 (not 2), and the rest of the analysis needs to be appropriately re-jiggered to cover that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: turbo</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/comment-page-1/#comment-8580</link>
		<dc:creator>turbo</dc:creator>
		<pubDate>Tue, 27 Jun 2006 13:11:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/#comment-8580</guid>
		<description>Nice paper. However, in the O((&#124;u&#124;+&#124;v&#124;).d) Algorithm you state the increment going down a diagonal is either 0 or 2, which is incorrect. A simple counter example is the array for the string &#039;cat&#039; and &#039;hat&#039;
which is :
&lt;code&gt;
xxcat
x0123
h1123
a2212
t3321
&lt;/code&gt;
Since the edit distance is 1, and the upper left corner is 0, this shows that the increment can be 1. In fact, any pair of strings with edit distance of 1 will exhibit this property.</description>
		<content:encoded><![CDATA[<p>Nice paper. However, in the O((|u|+|v|).d) Algorithm you state the increment going down a diagonal is either 0 or 2, which is incorrect. A simple counter example is the array for the string &#8216;cat&#8217; and &#8216;hat&#8217;<br />
which is :<br />
<code><br />
xxcat<br />
x0123<br />
h1123<br />
a2212<br />
t3321<br />
</code><br />
Since the edit distance is 1, and the upper left corner is 0, this shows that the increment can be 1. In fact, any pair of strings with edit distance of 1 will exhibit this property.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: girtby.net &#187; Blog Archive &#187; Responding to Adrian</title>
		<link>http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/comment-page-1/#comment-7167</link>
		<dc:creator>girtby.net &#187; Blog Archive &#187; Responding to Adrian</dc:creator>
		<pubDate>Mon, 05 Jun 2006 12:14:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/2006/01/16/comparing-strings-an-analysis-of-diff-algorithms/#comment-7167</guid>
		<description>[...] For a good introduction to diffing and the algorithms commonly used, see OddThinking. [...]</description>
		<content:encoded><![CDATA[<p>[...] For a good introduction to diffing and the algorithms commonly used, see OddThinking. [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
