<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments for OddThinking</title>
	<atom:link href="http://www.somethinkodd.com/oddthinking/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.somethinkodd.com/oddthinking</link>
	<description>A blog for odd things and odd thoughts.</description>
	<pubDate>Thu, 28 Aug 2008 12:35:07 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>Comment on Free Beer from Google = Brewgle? by Geoff</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/07/free-beer-from-google-brewgle/#comment-143097</link>
		<dc:creator>Geoff</dc:creator>
		<pubDate>Tue, 26 Aug 2008 09:52:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=572#comment-143097</guid>
		<description>Sounds good to me. Most days are the same. Was that "a central location" or "a Central location" ? How is the retirement going, he asked jealously.</description>
		<content:encoded><![CDATA[<p>Sounds good to me. Most days are the same. Was that &#8220;a central location&#8221; or &#8220;a Central location&#8221; ? How is the retirement going, he asked jealously.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143057</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Tue, 26 Aug 2008 01:39:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143057</guid>
		<description>Had I searched for free software earlier, here is some of what I may have found:

&lt;a href="http://www.freeware-guide.com/rareware/DupDetector.html" rel="nofollow"&gt;DupDetector&lt;/a&gt; - I ran it, but it crashed after 117 photos.

&lt;a href="http://sourceforge.net/projects/imagecomparitor/" rel="nofollow"&gt;Image Comparitor&lt;/a&gt; - I ran it. Found more duplicates than I did. Output UI is a little obscure; I wish it offered to just saved the results to a CSV.

&lt;a href="http://sourceforge.net/projects/dupimages/" rel="nofollow"&gt;DupImages&lt;/a&gt; - A dead project, but it sounds like it may have used similar approach to Alastair's suggestion.

&lt;a href="http://sourceforge.net/projects/dfk/" rel="nofollow"&gt;Duplicate File Killer&lt;/a&gt; - Functionality still on the drawing board.

&lt;a href="http://sourceforge.net/projects/imagesorter/" rel="nofollow"&gt;ImageSorter&lt;/a&gt; - Wrong OS. Didn't look at.</description>
		<content:encoded><![CDATA[<p>Had I searched for free software earlier, here is some of what I may have found:</p>
<p><a href="http://www.freeware-guide.com/rareware/DupDetector.html" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/www.freeware-guide.com');" class="liexternal">DupDetector</a> - I ran it, but it crashed after 117 photos.</p>
<p><a href="http://sourceforge.net/projects/imagecomparitor/" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/sourceforge.net');" class="liexternal">Image Comparitor</a> - I ran it. Found more duplicates than I did. Output UI is a little obscure; I wish it offered to just saved the results to a CSV.</p>
<p><a href="http://sourceforge.net/projects/dupimages/" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/sourceforge.net');" class="liexternal">DupImages</a> - A dead project, but it sounds like it may have used similar approach to Alastair&#8217;s suggestion.</p>
<p><a href="http://sourceforge.net/projects/dfk/" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/sourceforge.net');" class="liexternal">Duplicate File Killer</a> - Functionality still on the drawing board.</p>
<p><a href="http://sourceforge.net/projects/imagesorter/" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/sourceforge.net');" class="liexternal">ImageSorter</a> - Wrong OS. Didn&#8217;t look at.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Alastair</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143042</link>
		<dc:creator>Alastair</dc:creator>
		<pubDate>Mon, 25 Aug 2008 23:09:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143042</guid>
		<description>"Math is hard! Let's go sorting!"

I'd be surprised if Photoshop (and possibly PIL, PHP) didn't preserve the EXIF data. Similarly I'd be surprised if Facebook did.</description>
		<content:encoded><![CDATA[<p>&#8220;Math is hard! Let&#8217;s go sorting!&#8221;</p>
<p>I&#8217;d be surprised if Photoshop (and possibly PIL, PHP) didn&#8217;t preserve the EXIF data. Similarly I&#8217;d be surprised if Facebook did.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143025</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Mon, 25 Aug 2008 15:33:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143025</guid>
		<description>Alastair,

In this case, I was definitely focussed on the thrill of the hunt. However, only while it meant a pleasant horse-ride across the moors. Once the fox hid amongst the thorny brambles of wavelet theory and JPEG compression (neither of which I pretend to understand in the slightest) I got bored.

The correct thing to do now, is to head off to the open-source butcher and buy some prepared code.

No, actually, the correct thing to do is to stop torturing these poor, innocent analogies.

As for EXIF data, I don't know if Photoshop, Facebook, PIL and PHP's graphics library all preserve them correctly during transformations. I don't know if VueScan populates them correctly in the first place. Maybe they all do it perfectly, but I don't have much trust in them.</description>
		<content:encoded><![CDATA[<p>Alastair,</p>
<p>In this case, I was definitely focussed on the thrill of the hunt. However, only while it meant a pleasant horse-ride across the moors. Once the fox hid amongst the thorny brambles of wavelet theory and JPEG compression (neither of which I pretend to understand in the slightest) I got bored.</p>
<p>The correct thing to do now, is to head off to the open-source butcher and buy some prepared code.</p>
<p>No, actually, the correct thing to do is to stop torturing these poor, innocent analogies.</p>
<p>As for EXIF data, I don&#8217;t know if Photoshop, Facebook, PIL and PHP&#8217;s graphics library all preserve them correctly during transformations. I don&#8217;t know if VueScan populates them correctly in the first place. Maybe they all do it perfectly, but I don&#8217;t have much trust in them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143024</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Mon, 25 Aug 2008 15:23:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143024</guid>
		<description>Sunny,

A simple histogram suffers from two problems: it is likely to be distorted by simple operations to adjust the brightness, and it doesn't allow meaningful bit-wise comparisons - you need to check for "approximately the same" rather than "exactly the same".</description>
		<content:encoded><![CDATA[<p>Sunny,</p>
<p>A simple histogram suffers from two problems: it is likely to be distorted by simple operations to adjust the brightness, and it doesn&#8217;t allow meaningful bit-wise comparisons - you need to check for &#8220;approximately the same&#8221; rather than &#8220;exactly the same&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143023</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Mon, 25 Aug 2008 15:22:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143023</guid>
		<description>Dan,

Yeah, implementing wavelets and other serious maths seem to be the &lt;a href="http://www.intel-research.net/Publications/Pittsburgh/101220041248_261.pdf" rel="nofollow"&gt;way&lt;/a&gt; to &lt;a href="http://dotnetdreaming.wordpress.com/2008/02/14/defining-metrics/" rel="nofollow"&gt;go&lt;/a&gt;.

Even better would be re-using the &lt;a href="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&#038;arnumber=1247428&#038;isnumber=27938" rel="nofollow"&gt;code&lt;/a&gt; of &lt;a href="http://www.pybytes.com/pywavelets/" rel="nofollow"&gt;others&lt;/a&gt;.

But that requires tedious theory. It'd be more fun to sort the photos by hand!</description>
		<content:encoded><![CDATA[<p>Dan,</p>
<p>Yeah, implementing wavelets and other serious maths seem to be the <a href="http://www.intel-research.net/Publications/Pittsburgh/101220041248_261.pdf" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/www.intel-research.net');" class="lipdf">way</a> to <a href="http://dotnetdreaming.wordpress.com/2008/02/14/defining-metrics/" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/dotnetdreaming.wordpress.com');" class="liexternal">go</a>.</p>
<p>Even better would be re-using the <a href="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&#038;arnumber=1247428&#038;isnumber=27938" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/ieeexplore.ieee.org');" class="liexternal">code</a> of <a href="http://www.pybytes.com/pywavelets/" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/www.pybytes.com');" class="liexternal">others</a>.</p>
<p>But that requires tedious theory. It&#8217;d be more fun to sort the photos by hand!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143022</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Mon, 25 Aug 2008 15:16:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143022</guid>
		<description>John,

I am quite prepared to retro-actively remove your comments if you would like, but frankly I think they both add to the conversation, so I am loathe to do so.</description>
		<content:encoded><![CDATA[<p>John,</p>
<p>I am quite prepared to retro-actively remove your comments if you would like, but frankly I think they both add to the conversation, so I am loathe to do so.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Alastair Rankine</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143015</link>
		<dc:creator>Alastair Rankine</dc:creator>
		<pubDate>Mon, 25 Aug 2008 12:18:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143015</guid>
		<description>[As is so often the case with OddThinking posts, I can't help wondering whether Julian just wants to get the job done, or whether he's more interested in the thrill of the chase. Assuming the latter for now...]

OK so here's my idea. I am not in any way a graphics expert, so try not to laugh too much.

Basically you want to re-compress each image at a known resolution (meaning pixel dimensions) and quantization. So pick an appropriate width and height, such as that of your largest image. Also pick a quantization matrix - this time probably you'll need to use the most aggressively compressed image.

Now re-compress all your images. You *could* probably use the hash functions you mentioned above (particularly if you use the YCbCr domain instead of RGB) but I think there's a better way, provided you are willing to get your hands dirty with JPEG internals.

Basically at the same resolution and quantization you should be able to use the DC coefficient as a rough proxy for that macroblock. So: just add the DC coefficients for all the macroblocks to produce a hash value for the channel/image as a whole.

In fact you don't need to re-compress the entire image to get this value - just enough to determine the DC coefficient for each macroblock. This has the benefit of being a fair bit quicker and also requiring only a single quantization value and not an entire matrix. The maths up to this point seems fairly easy actually.

This gives you three hash values, one for each channel, and hence allows you to detect certain types of other transforms. As you say, you can just compare the Cb and Cr hashes to detect a change in brightness. Alternatively you could just look at the Y hashes to detect an image that has been converted to B&#38;W.

In the "just get the job done" category of solutions: have you thought about doing a hash of (key fields in) the EXIF data? Or are you concerned about removal of such data as well?</description>
		<content:encoded><![CDATA[<p>[As is so often the case with OddThinking posts, I can't help wondering whether Julian just wants to get the job done, or whether he's more interested in the thrill of the chase. Assuming the latter for now...]</p>
<p>OK so here&#8217;s my idea. I am not in any way a graphics expert, so try not to laugh too much.</p>
<p>Basically you want to re-compress each image at a known resolution (meaning pixel dimensions) and quantization. So pick an appropriate width and height, such as that of your largest image. Also pick a quantization matrix - this time probably you&#8217;ll need to use the most aggressively compressed image.</p>
<p>Now re-compress all your images. You *could* probably use the hash functions you mentioned above (particularly if you use the YCbCr domain instead of RGB) but I think there&#8217;s a better way, provided you are willing to get your hands dirty with JPEG internals.</p>
<p>Basically at the same resolution and quantization you should be able to use the DC coefficient as a rough proxy for that macroblock. So: just add the DC coefficients for all the macroblocks to produce a hash value for the channel/image as a whole.</p>
<p>In fact you don&#8217;t need to re-compress the entire image to get this value - just enough to determine the DC coefficient for each macroblock. This has the benefit of being a fair bit quicker and also requiring only a single quantization value and not an entire matrix. The maths up to this point seems fairly easy actually.</p>
<p>This gives you three hash values, one for each channel, and hence allows you to detect certain types of other transforms. As you say, you can just compare the Cb and Cr hashes to detect a change in brightness. Alternatively you could just look at the Y hashes to detect an image that has been converted to B&amp;W.</p>
<p>In the &#8220;just get the job done&#8221; category of solutions: have you thought about doing a hash of (key fields in) the EXIF data? Or are you concerned about removal of such data as well?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Sunny Kalsi</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-143008</link>
		<dc:creator>Sunny Kalsi</dc:creator>
		<pubDate>Mon, 25 Aug 2008 08:53:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-143008</guid>
		<description>Histograms. Why not use an RGB histogram as your hash? This should be fairly robust to file size changes, right?</description>
		<content:encoded><![CDATA[<p>Histograms. Why not use an RGB histogram as your hash? This should be fairly robust to file size changes, right?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hashes To Detect Resized Images by Dan</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/08/25/hashes-to-detect-resized-images/#comment-142998</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Mon, 25 Aug 2008 02:20:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=599#comment-142998</guid>
		<description>I'm not an expert at this sort of thing, but I used to do some work in the image processing field.  My sense is that the pros would use Wavelets in some appropriate colorspace (HSL maybe).  I recall a paper by Jacobs and Salesin from SIGGRAPH in the 90s.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not an expert at this sort of thing, but I used to do some work in the image processing field.  My sense is that the pros would use Wavelets in some appropriate colorspace (HSL maybe).  I recall a paper by Jacobs and Salesin from SIGGRAPH in the 90s.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
