<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Robots.txt Boost: Don&#8217;t record these URLs</title>
	<atom:link href="http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/</link>
	<description>A blog for odd things and odd thoughts.</description>
	<lastBuildDate>Wed, 10 Mar 2010 17:00:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-124923</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Sun, 01 Jun 2008 10:40:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-124923</guid>
		<description>Configurator,

If they link directly to the image (e.g. &lt;code&gt;&lt;a href=&quot;mysite.com/wherever.jpg&quot;&gt;Kevin Rudd&#039;s Shoe-Size&lt;a&gt;&lt;/code&gt;), it will be found when you search for Kevin Rudd. I have no option to include a NoIndex modifier. You claim that is indexing &lt;em&gt;their&lt;/em&gt; site. I claim that it is linking the name Kevin Rudd to my site, which I want to prevent.

I can see this is two ways of different ways of looking at the same thing, but there doesn&#039;t seem to be a way of me controlling the indexing of my images here, which seems wrong.

I &lt;em&gt;could&lt;/em&gt; modify my server to give 404s to Google and other known bots when they look at images, but that seems over the top.</description>
		<content:encoded><![CDATA[<p>Configurator,</p>
<p>If they link directly to the image (e.g. <code>&lt;a href="mysite.com/wherever.jpg">Kevin Rudd's Shoe-Size&lt;a></code>), it will be found when you search for Kevin Rudd. I have no option to include a NoIndex modifier. You claim that is indexing <em>their</em> site. I claim that it is linking the name Kevin Rudd to my site, which I want to prevent.</p>
<p>I can see this is two ways of different ways of looking at the same thing, but there doesn&#8217;t seem to be a way of me controlling the indexing of my images here, which seems wrong.</p>
<p>I <em>could</em> modify my server to give 404s to Google and other known bots when they look at images, but that seems over the top.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: configurator</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-124862</link>
		<dc:creator>configurator</dc:creator>
		<pubDate>Sat, 31 May 2008 22:27:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-124862</guid>
		<description>The footprint image will only be shown when someone embeds that image in their site, not when they link to it. So if your site has noindex and nobody embeds the image, it will not be indexed. And if they do embed it, you can&#039;t really stop indexing from -their- site, can you?</description>
		<content:encoded><![CDATA[<p>The footprint image will only be shown when someone embeds that image in their site, not when they link to it. So if your site has noindex and nobody embeds the image, it will not be indexed. And if they do embed it, you can&#8217;t really stop indexing from -their- site, can you?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aristotle Pagaltzis</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-108450</link>
		<dc:creator>Aristotle Pagaltzis</dc:creator>
		<pubDate>Sat, 12 Apr 2008 07:52:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-108450</guid>
		<description>You may not have an account for it, but you already have an OpenID… or four. ;-)</description>
		<content:encoded><![CDATA[<p>You may not have an account for it, but you already have an OpenID… or four. <img src='http://www.somethinkodd.com/oddthinking/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-108370</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Sat, 12 Apr 2008 02:18:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-108370</guid>
		<description>Richard,

You have assuaged my fear that common search bots might not support the NoIndex meta tag.

It still leaves the problem that each bot will need to visit each of the 10,000-or-so generated pages just to find out it shouldn&#039;t index the page. 

Perhaps I am being too miserly with CPU and bandwidth? 10,000 page hits per bot spread over time is possibly not worth worrying over. The pages aren&#039;t large (assuming you don&#039;t download the images).

Oh dear! Images! Suppose you don&#039;t link to Kevin Rudd&#039;s generated database page, but instead link straight to the image of Kevin Rudd&#039;s footprint. While I can block the images with a judicious robots.txt file, I can&#039;t include a NoIndex clause in a JPEG file.</description>
		<content:encoded><![CDATA[<p>Richard,</p>
<p>You have assuaged my fear that common search bots might not support the NoIndex meta tag.</p>
<p>It still leaves the problem that each bot will need to visit each of the 10,000-or-so generated pages just to find out it shouldn&#8217;t index the page. </p>
<p>Perhaps I am being too miserly with CPU and bandwidth? 10,000 page hits per bot spread over time is possibly not worth worrying over. The pages aren&#8217;t large (assuming you don&#8217;t download the images).</p>
<p>Oh dear! Images! Suppose you don&#8217;t link to Kevin Rudd&#8217;s generated database page, but instead link straight to the image of Kevin Rudd&#8217;s footprint. While I can block the images with a judicious robots.txt file, I can&#8217;t include a NoIndex clause in a JPEG file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Julian</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-108369</link>
		<dc:creator>Julian</dc:creator>
		<pubDate>Sat, 12 Apr 2008 02:17:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-108369</guid>
		<description>Aristotle,

I may be missing something, but supporting OpenID won&#039;t solve the problem, for two reasons.

It will mean that (depending how I configure it) users won&#039;t need yet another username and password for my site, but they would still need to authenticate somehow. 

Similarly, there is still an initial registration. As a regular web-surfer, I still do not yet have an OpenID account (to my knowledge? perhaps some of the web accounts I have are OpenID-ready?). I don&#039;t expect the occasional visitor will have one. I hope both of those facts change in the next five years

More importantly, while Googlebot won&#039;t be able to read the contents of the page, it will still serve links to it. Kevin Rudd seekers will find a link to his shoe-size.</description>
		<content:encoded><![CDATA[<p>Aristotle,</p>
<p>I may be missing something, but supporting OpenID won&#8217;t solve the problem, for two reasons.</p>
<p>It will mean that (depending how I configure it) users won&#8217;t need yet another username and password for my site, but they would still need to authenticate somehow. </p>
<p>Similarly, there is still an initial registration. As a regular web-surfer, I still do not yet have an OpenID account (to my knowledge? perhaps some of the web accounts I have are OpenID-ready?). I don&#8217;t expect the occasional visitor will have one. I hope both of those facts change in the next five years</p>
<p>More importantly, while Googlebot won&#8217;t be able to read the contents of the page, it will still serve links to it. Kevin Rudd seekers will find a link to his shoe-size.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sunny Kalsi</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-107328</link>
		<dc:creator>Sunny Kalsi</dc:creator>
		<pubDate>Wed, 09 Apr 2008 07:00:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-107328</guid>
		<description>There&#039;s this sort-of club near where I live where they talk about shoe sizes. They meet up in this abandoned building with broken windows and stuff. There&#039;s no &quot;club leader&quot; as such, and no &quot;club registry&quot;. People just politely assume that they belong to the club when they wander in.

Unfortunately, sometimes people tell other people about the club, or sometimes people see other people wander into the building and get curious. Worst case it sometimes gets into shoe mags and sometimes a bunch of randoms show up and it&#039;s really awkward... The people don&#039;t want to make it a proper club, because that makes it too much effort, and they want it to be casual.

They instituted a rule:

The first rule of shoe club is that nobody talks about shoe club.

The second rule: If this is your first night, you have to... shoe...

True story.</description>
		<content:encoded><![CDATA[<p>There&#8217;s this sort-of club near where I live where they talk about shoe sizes. They meet up in this abandoned building with broken windows and stuff. There&#8217;s no &#8220;club leader&#8221; as such, and no &#8220;club registry&#8221;. People just politely assume that they belong to the club when they wander in.</p>
<p>Unfortunately, sometimes people tell other people about the club, or sometimes people see other people wander into the building and get curious. Worst case it sometimes gets into shoe mags and sometimes a bunch of randoms show up and it&#8217;s really awkward&#8230; The people don&#8217;t want to make it a proper club, because that makes it too much effort, and they want it to be casual.</p>
<p>They instituted a rule:</p>
<p>The first rule of shoe club is that nobody talks about shoe club.</p>
<p>The second rule: If this is your first night, you have to&#8230; shoe&#8230;</p>
<p>True story.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aristotle Pagaltzis</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-107271</link>
		<dc:creator>Aristotle Pagaltzis</dc:creator>
		<pubDate>Wed, 09 Apr 2008 01:15:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-107271</guid>
		<description>Support OpenID?</description>
		<content:encoded><![CDATA[<p>Support OpenID?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard</title>
		<link>http://www.somethinkodd.com/oddthinking/2008/04/09/robotstxt-boost-dont-record-these-urls/comment-page-1/#comment-107250</link>
		<dc:creator>Richard</dc:creator>
		<pubDate>Tue, 08 Apr 2008 22:23:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.somethinkodd.com/oddthinking/?p=531#comment-107250</guid>
		<description>robots meta tag with its &quot;noindex, nofollow&quot; content has been around since at least 98: I was using it then fairly successfully with wget, the GNU web spider, which by now is probably the template of most link harvesters. It seems this tag was defined as part of the W3C recommendation for HTML 4.0, and that was in 1997. The text of the spec also has this interesting statement (even in the 4.01 update):

&lt;blockquote cite=&quot;http://www.w3.org/TR/REC-html40-971218/appendix/notes.html#h-B.4.1.2&quot;&gt;Note. In early 1997 only a few robots implement this, but this is expected to change as more public attention is given to controlling indexing robots.&lt;/blockquote&gt;

It&#039;s been more than ten years, so I think you can stop worrying about whether it&#039;s supported. In fact, I think any spider that doesn&#039;t respect this tag isn&#039;t going to respect your robots.txt either.</description>
		<content:encoded><![CDATA[<p>robots meta tag with its &#8220;noindex, nofollow&#8221; content has been around since at least 98: I was using it then fairly successfully with wget, the GNU web spider, which by now is probably the template of most link harvesters. It seems this tag was defined as part of the W3C recommendation for HTML 4.0, and that was in 1997. The text of the spec also has this interesting statement (even in the 4.01 update):</p>
<blockquote cite="http://www.w3.org/TR/REC-html40-971218/appendix/notes.html#h-B.4.1.2"><p>Note. In early 1997 only a few robots implement this, but this is expected to change as more public attention is given to controlling indexing robots.</p></blockquote>
<p>It&#8217;s been more than ten years, so I think you can stop worrying about whether it&#8217;s supported. In fact, I think any spider that doesn&#8217;t respect this tag isn&#8217;t going to respect your robots.txt either.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
