OddThinking

A blog for odd things and odd thoughts.

Deleted items in Feed Aggregators

Item 1

Feed Aggregators, such as Google Reader, keep backlogs of (RSS/Atom) feed data. This enables a reader to see a long history of a blog, while the feed producer and consumers can save CPU and bandwidth by transferring only a relatively short feed history in what is – for many sites -the most popular page served.

Item 2

Despite the best efforts of Spam Karma 2, sometimes spam comments get posted to this blog. Normally, I am fairly quick to squash them, and they are deleted within 12 hours.

Item 3

Google Reader (amongst others) contacts my blog more often than once every 12 hours.

Result

Item 1 + Item 2 + Item 3 = Spam comments get added to my comments feed, and never get removed.

I have understood this problem for a while, and yesterday I set out to patch it.

The obvious solution is not to entirely eradicate spam comments, but to replace them with a stub in the feed that says “Item Id=XYZ is NULL, so skip it.” This could then override the previous value for Item Id=XYZ in the Google Reader cache.

I believe that this might work in Atom, but not in RSS, because Atom has a GUID field, while RSS doesn’t.

Wait? Is that Alexander Pope I can hear turning in his grave? I think I should retract the last statement, and leave it for the RSS experts to comment.

So, off I went to check on the official WordPress solution for the problem. No luck.

So, off I went to check for a WordPress plugin that would solve it. I didn’t find one, but that doesn’t prove much, because there are so many.

So, off I went to check for Google Reader’s recommendation for RSS producers. No luck (See the comment by Mihai Parparita.)

I’m stuck for now…


Comments

  1. A proper solution for this was proposed in the (now expired) Atom Tombstones I-D. Google that phrase to find it.

  2. Thanks, Aristotle. That would have been exactly what I/we needed.

    For the Google-deficient: The expired Internet Draft for Atom Syndication Format Tombstones

    If the IETF can’t agree on Plan A, then perhaps Plan B would work.

    If a comment is deleted after it appeared in a feed for the first time, keep it in the feed, but change the text to “Spam deleted.” Poor usability, but at least the spammer doesn’t win, because the feed aggregator’s cache should be overwritten.

    Note: Don’t do this for every deleted comment, or you will get Denial of Service on your comment feed – only do it for the ones that lasted long enough to make it to the feed at least once.

    Note: An almost-equivalent solution would for me not to manually delete comments as spam, but instead to edit them to “Spam deleted”. However, that would mean that the HTML web-site would be polluted with these spam-corpses, not just the RSS feed.

  3. [This comment is just for the record, because I forgot the details of this discussion, and will probably forget it again.]

    WordPress 2.2.1, with better-than-ever Atom support, does not support tombstones. Deleted comments do not appear in any way in the Atom feed. Hence, they will continue to appear in Google Reader and the like. The solution isn’t here yet.

    I know this because I just installed another test blog and tried it out.

  4. I’m not surprised it doesn’t. To my knowledge, no software anywhere has ever actually implemented tombstones, and in particular, no aggregators do. They were thrown around on the Atom WG list for a while and some people who are involved with some of the bigger syndication-related products/projects called them a cool idea, but obviously nothing came of any of that. Ah well.

Leave a comment

You must be logged in to post a comment.