OddThinking

A blog for odd things and odd thoughts.

A Snowball’s Chance of Beating Spam

Subscribers to my RSS comments feed may have noticed a recent increase in the number of “obvious” spams getting through. I clean them up quickly, but they sometimes get into the RSS feed.

WordPress improvement idea: A 2-hour grace period on the RSS feed for new posts, and a 36-hour grace period for comments. That would give me time to correct formatting errors in the posts and manually filter the false-negative spams before they get published to the 5-star readers.

I chased down why these spams were getting through the highly-regarded SpamKarma 2.2 plugin. SpamKarma has a filter called the Snowball filter. Looking at the code, it seems to have two key pieces of functionality.

The first is based on the realisation that if you have provided a legitimate comment before, then this comment is probably legitimate, but if you have spammed before then this comment is probably spam.

The second is based on the realisation that if you have sent a lot of comments recently, and the first few get through, but a later one is detected as spam, then the earlier ones are more likely to be spam than was first thought. The bad spam karma “snowballs”, and is applied retroactively. To quote a section of the code, if it detects one of the comments is spam, it will “unleash all minions of Hell on that bad boy’s company…”

The trouble is that the detection of the comment author is a little naive. It uses a number of factors including the email address, the IP address and the domain name of the URL provided by the commenter. It is the latter that is the problem. The Snowball filter gives too much credence to matching domain names.

Remember when Sunny Kalsi defended not having his own domain name? Well, an unexpected downside of him using blogspot.com is that so do spammers. The spammers post to OddThinking including links to their spam blogs, also hosted on blogspot. SpamKarma looks at the “blogspot.com” domain, and notices that this is the same domain as a regular, highly-valued commenter, and figures that therefore this new comment should be let through.

I should make it clear, Sunny is totally innocent here. He is the good guy in this story.

The good news is I reported it to the drDave, the author of SpamKarma, and, once again, he was incredibly responsive. He has a solution in mind, and hopes to implement it very shortly.

So it seems that drDave is the other good guy in this story.
For the record, the SpamKarma SnowBall filter puts the following text in the log file “Commenter granularity (based on URL):”, followed by an inappropriately high level of karma. I only mention this so that people searching on the web for this issue can find this description.

Comments

  1. As I told you, I’ve been meaning to address this issue for a while…

    The latest beta of SK2 now checks the domain against the list of greylisted domains (blogspot, blogger and others are greylisted by default. More can be added in the Blacklist panel). If the domain is greylisted, then SK2 will run the URL Snowball test against the full URL.

    Feel free to install the beta and let me know how it fares for you:
    http://www.wp-plugins.net/sk2/sk23_beta.zip

  2. dr Dave, you’re a champion!

    I installed SK2.3 beta. I double-checked blogspot.com is in the greylist. I couldn’t see any other options that needed tweaking…

    Now, I guess we wait and see…

  3. Today, a nasty spammer tried to refer you, Good Reader, to a blogspot.com blog advertising an on-line casino. Spam Karma 2.3 Beta saved you from being distracted by it.

    This post is a quick check against false positives. It was written by Julian, but with false author information.

  4. Amazing that all through this, I noticed none of it 🙂

    When I wrote that comment I actually didn’t know that spammers regularly used blogspot accounts. From recently having to… errr… “investigate” the seedy underbelly of the internet, I noticed the spam accounts. I’d completely forgotten about the comment I made, but now that you’ve drawn the line between the spam accounts and my own blog, it gets me perturbed.

    See, I think the spammer blogspot accounts are a recent thing. Maybe a year or two, but not more. I’ve been happy with the fact that I’ve been in the company of idiots, but never someone malicious. Seeing this possibility has me considering a better option, seeing as I don’t really want my domain to be associated with malice. I’ve never considered web hosting like geocities or angelfire mainly because of this.

    However, in another twist of fate, my own site lost all it’s pictures, which I hosted on a friend’s (paid for) domain, because the web space was removed without adequate notification. It’s yet another reminder of all the work I’ll have to do to achieve pretty much nothing, and the kind of data loss that’s possible.

    Sorry about the flurry of activity I caused as a result.

  5. Sunny:

    I think the spammer blogspot accounts are a recent thing. Maybe a year or two, but not more.

    The phenomenon exploded in late fall 2005, prior to which it had been around but low-key. I’d wager that the splog concept is less than two years old, and really caught on less than 9 months ago.

    PS.: Have I mentioned that “splog” is a really revolting neologism?

Leave a comment

You must be logged in to post a comment.

Web Mentions

  1. Banditry » Blog Archive » Spam filters and paranoia