How We Beat Comment Spam

Mikey 89 comments
  • Web
How We Beat Comment Spam

Update: We are no longer using this method as it, after more than a year of successful use, seems to be beaten regularly now.

We have been running this site for close to a couple of years now and as its popularity grew, so did the frequency of comment spam.

The past 3 or 6 months have been particularly bad, seeing me manually deleting 30 - 100 spam comments per day from the database.

We have tried all the usual methods with the exception of CAPTCHA, because I still maintain it being a burden on the user. It's hard enough to encourage people to leave comments as it is without forcing them to decipher a drunken alpha numeric sequence. And if they don't get it right the first time there's a good chance you have lost them forever.

So back to the drawing board and about two months ago we have worked out something that has seen comment spam reduced to nil - zip - zero - nada. Well to be truthful, technically the comment spam is still coming in but we are simply hiding it from view.

Before I go on, let me make it clear we do not claim to be experts on spam-bots. We simply have learned what we know from our own experiences and reading about other people's experiences.

Let me also add that this method may not work for everyone, but as mentioned we have had 100% success with it for the past 2 months, which is long enough for us to be confident it works - at least for our web site. We do not use Wordpress or anything like that. It's our own custom built CMS which has been around much longer and is always evolving.

Some of you may find flaws in this method or question its effectiveness. Or even suggest an improvement. We are always open to suggestion so please comment below.

Obligatory disclaimer: We have not come across this method before despite all our research. There may be others who have already implemented this or a similar method. We only take credit for figuring it out for ourselves.

"But we did notice an obvious trend with the spam - and that is it always fills in every field on our comment form"

In our early tests, we acknowledged identical spam accross different articles, character for character, even though the source IP address was sometimes different. One of our original ideas was to log the IP of the spammer and simply block it. But we soon realised blocking the IP address only stopped them for a day or so, until the same spam would be eventually re-posted but with a completely different IP (I told you we were new to this). So that idea went belly up.

But we did notice an obvious trend with the spam - and that is it always fills in every field on our comment form regardless of what we name the fields, or how many more fields we introduce or subtract. As spam bots do not physically see the forms like humans do, the solution was very simple.

We introduced a physically "hidden" field which we called "lastname", and when I say "hidden" I don't mean:

< input type="hidden" name="lastname" />

...but rather, a regular field with a class...

< input name="lastname" type="text" class="lastname" />

...which is hidden from physical view with some CSS:

.lastname {
    visibility: hidden;
}

(Note: We called the field "lastname" because we think it conceivable that some spambots might be designed to look for field names that are considered 'safe' - not a trick. If the field was called 'spam' or similar, it might deliberately leave it empty, and our method would fail.)

The only thing that can populate this field is a comment from a spam-bot, because humans can't see it and can't even tab into it. All comments get stored in our database regardless of authenticity (spam or human), but comments posted that filled in the "lastname" field (spam-bots!) get flagged in the database for filtering. We flag them as "IsSpam" for easy visual identification.

Now, with a small addition to the code that pulls the comments from the database when the page is rendered, we simply instruct it to not include comments that have been flagged as "IsSpam". Something like this:

sql = "SELECT commentID, comment FROM article_comments WHERE active = 1 AND isSpam 1

"Our assumption is that if a spam-bot is able to successfully submit the form, then it will be satisfied and move on..."

We could take it a step further and simply reject comments that had the "lastname" field filled in, which we will probably implement soon now that we don't have a need to examine the spam anymore. Our assumption is that if a spam-bot is able to successfully submit the form, then it will be satisfied and move on, so to speak. As far as the spam-bot is concerned it has done its job and has no reason to immediately try and re-spam using a different approach.

So the most obvious problem: with our spam comments outweighing our legitimate comments 50 to 1, won't our database start to fill up rather quickly? Luckily we have plenty of disc space so that is not an issue, but we do like to keep things neat back there. So once a week or less often someone simply goes into PHPMyAdmin, filter the comments by "IsSpam" and delete all. Alternatively if you know how to set up a cron job you can delete the spam automatically.

So this method does nothing to actually combat spam-bots, or anything that might resemble an effort to prevent them from coming back. In fact it does quite the opposite in allowing them to think they are fulfilling their purpose. But it does stop our site from having Viagra and PrOn adverts mixed in among legitimate discussion.


CompTIA Network+ N10-004 exam is getting popular in the IT network professionals along with network appliance NS0-153 exam for storage networking and NS0-163 for data protection solutions.

Not a Member!

Forrest

Tuesday 30th January 2007 | 05:52 PM

Ok, I am impressed. I assumed it would be lame or software heavy, but the method is brilliant. I'm going to put it into practice on my site. Thanks

Not a Member!

Alexa

Tuesday 30th January 2007 | 06:11 PM

That is so freaking obvious. Why hasn't anyone thought of it b4?

Not a Member!

Michael

Tuesday 30th January 2007 | 06:25 PM

Glad you approve, Forrest. Let us know what level of success you have with this method. As this is the only site we have tested it on, we would love to know how well it works on another.

Not a Member!

Andy Murdoch

Tuesday 30th January 2007 | 09:21 PM

I implemented something similar on my site, where comments are powered by the excellent YaBB forum software. I found that rejecting any comments from spam bots immediately worked fine - there was no need to pretend that their attempt had succeeded.

Full details are here:

http://www.andymurdoch.com/Stuff/yabbspam.shtml

Not a Member!

Matt S

Wednesday 31st January 2007 | 02:54 AM

What about the visually impaired or other non-graphical browsing?

Not a Member!

Keith Gaughan

Wednesday 31st January 2007 | 04:54 AM

There's a name for this already: it's called a honeypot, and is already in common use.

Not a Member!

Rich

Wednesday 31st January 2007 | 06:39 AM

Nice salvo in the spam fight. it will work until the amount of websites implementing the filter this way makes economic sense for the spam-bots to understand rudimentary CSS and decline to fill in any form fields with the :hidden attribute. This may take quite some time, and is actually a pretty smart way of doing it.

Not a Member!

Jared

Wednesday 31st January 2007 | 06:54 AM

It's a very simple, albeit effective, solution. The problem with this simple solution is that there is a simple workaround, all the spammers need to do is add a simple check to see if a particular field is invisible and skip it (about 8 lines of Javascript to do this.) Couple that with what Matt S said and I would say this is a non-viable solution for many.

Not a Member!

Mr Angry

Wednesday 31st January 2007 | 09:18 AM

That's a very clever implementation. I tend to agree with Rich who suggested that if/when this approach becomes widespread spammer will implement a workaround. I use Akismet which has a 99+% success rate with comment spam (no idea what their actual method is.

One question: does your approach work with pingbacks (a weak point for Akismet)

Not a Member!

Michael

Wednesday 31st January 2007 | 10:18 AM

Sorry for the late response. If I may answer a few questions all at once...

"What about the visually impaired or other non-graphical browsing?"
This was not a consideration I am afraid to say.

"does your approach work with pingbacks"
To be honest I am not entirely sure but I can not see why not. But then I am no pingpack expert.

Regarding some of the other comments, I would agree this solution is not for everyone. And at first I was hesitant about even releasing the information because spammers might make the necessary adjustments. But its more important that people know so it can be perhaps be improved.

I have not heard of the honeypot method Keith mentioned but just doing some http://spamlinks.net/track-trace-honeypot.htm#fighting">quick research and I believe that method is a lot more complex, whereas our solution is simple. Unless I am mistaken.

If anyone implements our method it would be great to know the results.

Not a Member!

Mark

Wednesday 31st January 2007 | 03:59 PM

To attempt to address Rich's potential problem, you could write the CSS in an encrypted JavaScript file (i.e. use document.write() to output a to the CSS file that hides the spam field).

I have no real knowledge of how spambots work, but I assume they just directly read the source of a page and don't have access to the DOM to dynamically work out what fields are visible. To see the DOM, the spambot would actually have to act like a web browser and properly render the page. If they do this then a different approach would be necessary for Rich's problem.

Starts to make a simple solution like this a little convoluted though...

Not a Member!

Larson

Wednesday 31st January 2007 | 08:22 PM

Mark's idea is a good addition to making this a more sound solution. It's not too difficult for a spambot to be configured to check for a 'hidden' class in a style sheet, so hiding it in this way might work. I am noticing a trend here - hiding everything.

Not a Member!

Keith Gaughan

Wednesday 31st January 2007 | 08:47 PM

Honeypots vary in complexity, but all a honeypot is is "a trap set to detect, deflect or in some manner counteract attempts at unauthorized use of information systems" (quoting from the Wikipedia article on honeypots there, which is quite good), which is just what your post describes.

Not a Member!

Keith Gaughan

Wednesday 31st January 2007 | 08:48 PM

BTW, here's yet more methods, including the one mentioned here: http://www.nedbatchelder.com/text/stopbots.html

Not a Member!

Michael

Wednesday 31st January 2007 | 09:10 PM

Thanks Keith. I checked out that URL and you are indeed correct. I am just so surprised we never found this or anything like it during our research. Makes me kinda proud we figured it out on our own.

But it does have some potential weaknesses (if it becomes too popular) as some have mentioned above, but Mark's idea of building the code with document.write() might actually work. I am going to implement that soon. It is simple and can't do any harm.

Not a Member!

RT Cunningham

Friday 2nd February 2007 | 01:37 AM

I never though of this technique. I wonder how I could implement this on WordPress. Perhaps there's a plugin... I doubt it though.

Good job!

Not a Member!

Bacardi G.

Saturday 3rd February 2007 | 06:40 AM

In my opinion this is a great simple solution for a novice designer to use to help combat against spam-bots. The author has stated several times that this is not a perfect solution but it does help reduce the amount of spam the site displays.

This is a very clever technique which can help out immensely to smaller sites. Thanks for taking the time to share this information, it is greatly appreciated.

Not a Member!

nickk

Tuesday 6th February 2007 | 03:02 AM

I hope this really work!!
by the way this is a very cool site!! love the design

Not a Member!

ponomar

Tuesday 13th February 2007 | 04:34 AM

Very nice site. Please keep updating it. Your site is exactly the kind of sites which make the net surfing so intresting.

Not a Member!

Klughman

Tuesday 13th February 2007 | 05:30 PM

So sonce you 'spilled the beans' on your secret have any spam-bots made the appropriate changes to beat you?

Not a Member!

Michael

Saturday 31st March 2007 | 10:53 AM

Hi Kulghman. No not yet! Fingers crossed. Since we originally set this up, we have not had a single, solitary spam comment on the site. We did have a false alarm once but it turned out to be someone manually spamming a comment, which is unavoidable as you would know.

Not a Member!

Sweetness & light

Saturday 31st March 2007 | 11:37 AM

This is also known as a honeypot and a similar thing has been done before, except you seem to be having success with it. Might be worth a try I think. Anything is better than capcha.

Not a Member!

Gordo

Saturday 31st March 2007 | 12:17 PM

This can be coded around easily. It would take maybe 3 lines of code to check if an element is hidden via CSS.

The better thing to do is find a common ground with all comment spam, and block it. What is that you ask?

[url
http
www
href

Block those, and you'll never get spam again unless it's a human doing the spamming. If anyone wants to post a link, you can setup something to allow that, such as a special tag or symbol to put around links to allow them, and can even have it change randomly if needed.

I personally just block the strings in the 4 lines above and I haven't had any comment spam since. For a while I had my blog script email me with the blocked comments, along with the IP, UserAgent, form details, etc... After a few months of no false-positives, i turned off the logging. I caught over 9,000 comments in a few months. I did not save them to the database since I'm a neat-freak, but instead had the comments emailed to me and I setup a filter to put them in their own folder. I'm constantly checking my email a few times per hour, so I was able to keep up with the influx of emails.

All in all... No single solution is perfect. You're either going to get false positives, or false negatives. I personally prefer false positives. When I catch a spammer, my blog script lets the user know, so if it really is a user, they know why their comment was rejected, and they can revise it.

Not a Member!

Crafter

Sunday 1st April 2007 | 03:32 AM

Hurray for the "HerbOtt method".

You dont have to hide the input field. Just put the field in a div and hide the DIV.

That, and the filterring as suggezted by Gordo, will give the spammers some sleepness nights.

Not a Member!

Emil Stenström

Sunday 20th May 2007 | 08:25 PM

As Crafter (almost) says there's an accessible way to do this. Just put a label in front of the textfield that says "Please don't type anything in this field". Easy and accessible.

Not a Member!

Mikey

Sunday 20th May 2007 | 09:11 PM

@ Emil Stenström:

Thanks for that. We are gearing up to launch the new design and I will implement that idea.

Not a Member!

rick (shobuz99)

Tuesday 3rd July 2007 | 11:56 AM

Read your suggestions but haven't tried them yet.
My problem appears to be 'bot'; but may not be.
Reason? Because I am using a php form with a"captcha" type of process. The SPAMs are coming in at a 10 per hour rate. Not too aggressive, but they go to an email instead of a DB. That email gets filled quickly and monitoring is not practical 24/7. So... will your solution work in combination with the Php Captcha form?If not, do you have any suggestions?

BTW.. I found your site through a thread on Webmaster-Forums.net. Glad I found it, too!
Shobuz99

Not a Member!

Michael

Tuesday 3rd July 2007 | 01:44 PM

Hi Rick. Are you saying spam is getting passed your captcha?

Not a Member!

rick (shobuz99)

Tuesday 3rd July 2007 | 11:11 PM

Michael,

Thank you for the reply.
Yes. SPAM is getting past the Captcha in the Php comments form.
I'm using a form mail program by dagondesign.com (older version)

It's been working great; eliminating SPAM bots, etc. since I installed it last year....... until yesterday.

The SPAM has all the earmarks of a bot; but also happens at a rate that a human would post. Like between 10 and 40 an hour. It fluctuates and then stops altogether. I didn't get any from last might around midnight (EDT) until this morning.... so far none. They came hot and heavy in the late afternoon, yesterday. It's a mystery to me.

Someone else suggested that there may be a hole or back door for the bot to get in.. if so, do you know, approximately, how that could happen?
Thanks
Rick (Shobuz99)

Not a Member!

Michael

Tuesday 3rd July 2007 | 11:26 PM

Hi Rick.

That is strange and I am not sure how that could be happening, unless it's an older captcha that spam bots have been tweaked to circumvent. I am no captcha expert so I may be talking outa my backside there.

As an example, when we first made this honey-pot I named the hidden field (hidden by CSS) 'spam'. Needless to say it didn't work and we guessed it was because spam bots had been tweaked to look for obvious fields like that and avoid them. As soon as I renamed the field to something that it would consider consider legitimate (lastname) it has worked flawlessly since, because the bot fills in this field, which gives us our filtering.

I would suggest adding a honey-pot and see what happens. I would not be surprised if you see a significant reduction in your spam.

Since I published this article I have made a small change - we no loner store the spam in the database. But that makes no difference to the prevention anyway.

If you do implement this, let me know how it goes. I still get the occasional email from people saying that it works for them.

Not a Member!

Rodney

Tuesday 3rd July 2007 | 11:42 PM

I think the entire online world should congratulate Mike on this one. It's such a simple idea and yet it's so effective. I'm surprised it's not more commonly deployed.

I've tried it on other systems, calling the field 'email' and it's worked very well. Perfectly, in fact.

The only thing I suggest adding to it, is the capacity to track spamming IPs. If the same IP offends enough, block it.

If you've got 'root', you can even use my personal favourite: the tarpit. This is the sweetest thing ever because if a worm or bot attaches to a tarpit, it grinds to a halt. Basically, this works by setting the data limit of the connection to zero bytes but the timeout window to a couple of minutes, effectively causing the bot to pause idly on your site. If it's monitored at all, they'll remove your IP from the list. The idea has been used in email anti-SPAM systems for quite a while now.

Not a Member!

Rick (shobuz99)

Thursday 5th July 2007 | 12:57 AM

Thanks very much for all your help and advice.
I will consider using Honey-pot.
Is Honey-pot strictly a "Captcha" or are there form fields that can be constructed; similar to the ones I already have?
If so, I think it could be better than what I'm using.
The posts seem to have stopped. Haven't gotten one in two days.. However, I'm not assuming it's over.
I am starting to think it is a human; since the poster is getting no satisfaction from doing it because I don't post the SPAM. He has no way of knowing for sure if it is annoying me or disrupting the web site. I guess we'll see what happens..
Rick (Shobuz99)

Not a Member!

Michael

Thursday 5th July 2007 | 07:55 AM

I guess Honey-pots can't be considered captchas because they are not an actual test that is presented to the user.

Be sure to log the IP address as welll that way if it is human you can at least block the IP (if you have access to a cPanel or similar you will have an 'IP Deny Manager' in there).

Good luck with it. Let us know how it works out.

Not a Member!

Gonzague Dambricourt

Wednesday 18th July 2007 | 08:53 PM

Awesome method. should someone develop a wordpress plugin ? :-)

Not a Member!

Michael M.

Sunday 29th July 2007 | 12:59 PM

What happens when the spam bots start checking if their spam is visible after posting, and if not, start algorithmically varying fields?

Just playing devil's advocate here, don't flame.

Not a Member!

Mikey

Sunday 29th July 2007 | 02:18 PM

Hi Michael M. Good thinking. Although to date the method hasn't failed us.

So what would happen if the spam-bot checks to see if the spam was posted? Would it try again using a different method? We are always looking for ways to improve this (we have been working on our v2 method recently in fact) so I would be interested in anything you have to offer.

Incidentally, the new version I mentioned will also be accessibility friendly.

Not a Member!

Rodney

Sunday 29th July 2007 | 03:32 PM

Recently captcha used to create Hotmail accounts was "beaten", with thousands of hotmail accounts created and used for spam in a single day.

After the inital panic was over it turns out catcha wasn't beaten, rather spammers are employing people in 3rd world countries to manually create the accounts, much like "farming" in WoW. I guess there's not much you can do about that kind of thing...

Not a Member!

Z-Man

Sunday 29th July 2007 | 07:45 PM

Can I have egg bacon spam and sausage without the spam?

Not a Member!

Flatline

Sunday 29th July 2007 | 07:50 PM

That method is awesome!

By the way, best poker site on the net:
WWW.[SPAM].COM

j/k

Not a Member!

Michael

Sunday 29th July 2007 | 08:14 PM

LOL thanks Flateline.

Not a Member!

Ace

Monday 30th July 2007 | 08:08 AM

ahh balls flatline, i just read all the way down to make sure no one else had bust that joke so i could. dammit. good method though chaps, keep it up.

Not a Member!

Anonymous

Monday 30th July 2007 | 02:21 PM

An interesting method, to be sure.
Note that if you've disabled CSS on your browser you can (as of now) see "Please leave the following spam prevention field empty." More likely to fail to trick bots in that format, but a response to the visually-impaired/non graphical browsing problem, I guess.

Not a Member!

Michael

Monday 30th July 2007 | 04:00 PM

Hi Anonymous. Yes that is there for accessibility purposes. We are currently working on an improvement.

Not a Member!

Chris

Monday 30th July 2007 | 06:59 PM

I just thought I might ask this, as it is possible you may have overlooked the following:

Browsers these days have an "autofill" feature that tends to add known details about a person into fields that are loaded as part of a web page, particularly through the use of Google toolbar and similar.

What is stopping a valid user from autopopulating the "lastname" field (which is likely to be listed as an autofill item) and thereby removing those user comments?

Other than that, it does seem like a good method, although it wouldn't take a competent coder much longer to take into account CSS. Scripting the visibility would work far more effectively, but that is disabled in a lot of browsers now anyway, which would alienate other users.

Nothing is perfect, there will always be false positives or false negatives.

Cheers, Chris.

Not a Member!

Dustin

Tuesday 31st July 2007 | 12:53 AM

I've found with WordPress that all of my comment spam comes from trackbacks being enabled... disable those and enable a CAPTCHA and I haven't had spam in months. This is definitely an innovative approach, however. :)

Not a Member!

Sergey

Tuesday 31st July 2007 | 08:20 AM

Please remove my other comments. Code was invisible there...

Your design is great.
Thank you that you sharing useful information in such interesting way.

I use another way of protection.
Just rename textarea 'message' field into 'mezzaj'.

It's working for 1 year for several sites I did. No more comments from any bots.

Regarding...
"What about the visually impaired or other non-graphical browsing?"
Matt, you can use 'birthday_year' as name for the field,
asking visitors to enter name:


Your name:
< input name="birthday_year" type="text" />



Then just check if birthday_year is numeric, then mark it as spam.

Thanks for the site.

Not a Member!

Andy

Tuesday 31st July 2007 | 10:31 AM

Sorry but this method is nothing new and is totally pointless. All the spambot has to do is check the DOM to see whether any form fields are hidden.

I have a Mozilla (gtkmozembed) application that could easily subvert such a defence.

The best solutions are:

# Use a service like Akismet (bayesian filtering),
# Use CAPTCHAS that have cultural meaning.
# Require login - ubiquitous OpenID will lessen the taboo of asking posters to do this.

Andy.

Not a Member!

Rodney

Tuesday 31st July 2007 | 12:45 PM

Andy I would disagree completely. If this method is pointless, why is this page not filled with comment SPAM?

This page is great because anyone can fire off a comment without having to have a log in or moderation or messing around with captcha.

The method has clearly worked well, thus far...

Not a Member!

Edward Meshuris

Tuesday 31st July 2007 | 04:52 PM

CAPTCHAS suck!
Nice idea, i just bookmarked it. However, your assumption about the bot moving on may solve the spam problem but not a malicious attack.
My variation would check before writing to the table using regular expressions.
Just my 2 cents, Thanks for the writeup.

-Edward

Not a Member!

Sean

Tuesday 31st July 2007 | 05:07 PM

Or, you could just create a checkbox called accept, or agree since those are probably likely to be ones that are used by people for "Do you agree to the terms?" Then put "Do not check box" beside it couldn't you?

Not a Member!

sir West

Wednesday 1st August 2007 | 05:19 AM

the idea itself is good, but first of all i like the div improvement. The good thing is that you can put div everywhere, and you basically can't link it to anything if you can't visually see it. Or, besides randomizing the hidden input box name on each load(!), you could also randomize the placing of that box by each load like before name field or after message field, making the bot to search it from one place and not to find it.
also, as i have used on my own website, you could use some special text written on your native language(ofcourse if it's not something famous like english or german etc), so only if you underestand the text you'll get the wright word to put in the check field.
I can honestly say it loud that i'm Estonian(in Europe) and you couldn't even find any automatic translator to this language :D
even better, the text basically apologizes in front of all visitors, saying it had to be done in order to avoid spam, and the word is actually always the same, that's why it's even easyer to remember it once you've read the text.
OK, so it's really good approach, but actually only if you use some foreign language like me, otherwise it's actually useless.My own site hasn't had any spam since then, and that's enough for me.

best cheers to all of you guys, the ideas are really one of the best i've seen!
sir West

Not a Member!

Andy

Wednesday 1st August 2007 | 05:57 AM

Rodney - I've noticed more and more people adopting this method - the spammers will soon learn and adapt to it. It won't take much work for them to beat this test.

If its working for you then that's good - I'm just saying that it won't work for very much longer.

Andy.

Not a Member!

Ryan

Wednesday 1st August 2007 | 06:52 AM

If you say instead
and have the css display it 300px off the top of the page, you can let screen readers and text-based web browsers know why they're not getting comments posted on your site.

Not a Member!

eh..

Wednesday 1st August 2007 | 06:54 AM

by the way, check out the source of Joe's comment.... tsk tsk

Not a Member!

Michael

Wednesday 1st August 2007 | 09:37 AM

Thanks for highlighting that 'eh'.

Not a Member!

Rodney

Wednesday 1st August 2007 | 11:38 AM

Andy rightfully points out this system will be beaten eventually. That's part of the evolution of warfare. The spammers get smart, we have to respond. This doesn't mean you just don't do anything because one day it won't work.

When some bot beats this system (and that's even more likely due to this discussion, telling them in detail how it works), the system will just need to evolve as well.

I wonder though, if you could port comments through spamassassin? After all, it's just Perl?

Not a Member!

Mikey

Wednesday 1st August 2007 | 12:13 PM

I concede this method can be overcome, but the simple fact remains I was once manually deleting 30 - 100 spam comments per day before implementing the honey pot over 6 months ago - and now we get zero spam.

We are working on a version 2 method with will also satisfy accessibility concerns. It is actually finished but not yet implemented as we would like some feedback first.

If you are a web developer and you would like to assist by letting us know of ways you think the new version could be beaten and offer work-arounds etc...please https://rustylime.com/show_article.php?id=432">contact us and I will put you in the loop.

My belief is we can overcome this without resorting to captcha. But as Rodney and others have pointed out it is just another stage in a war that will no doubt be countered eventually.

Mikey.

Not a Member!

eh..

Wednesday 1st August 2007 | 05:00 PM

document.write('just checking');

Not a Member!

Ryan

Wednesday 1st August 2007 | 05:03 PM

Sweet. I got the last tag'd comment. Actually the last two, but then I told you about it.. Sorry for all the comment-spam :-)

Not a Member!

Mikey

Wednesday 1st August 2007 | 05:16 PM

Think nothing of it :-) Actually you helped me plug a hole that could have caused me a world of pain.

Not a Member!

Travis McCrea

Wednesday 1st August 2007 | 07:34 PM

I was working on something just like this for Wordpress my codename for it was "Capatus" but alas I never had the time to do it.
I never knew anyone else had researched this method.. you can see my site (a few blogs back) about how i wanted to do the same thing.

Not a Member!

Travis McCrea

Wednesday 1st August 2007 | 07:39 PM

A few problems I was running into that you may have over looked:
1) Autofillers - Some people have systems that fill in all the fields on a page, and last name would be a field it would see as recognizable if the person has entered their last name before.
2) Screen Readers - The visually impaired my still get this if they are using a fancy blind/visually impaired browser.

for now i have created a simple capatcha program that asks "are you a human, the answer is yes if you are" and then they fill out 'yes' in the field. This works unless a spambot found my site and was targeting only it, but for the time it would take them to script it to fill out yes, they could easily attack 100 more sites.

Not a Member!

Travis McCrea

Wednesday 1st August 2007 | 07:42 PM

Sorry for the three posts but i just wanted to point out one more thing... if they have css disabled, they MAY for some weird reason have the field filled out and would never know.. even though it says "please leave the following field empty" its still there and once again an auto filler could fill it out.
Its a great system though... and you might only lose 1 person compared to the 10 you might lose with capatha but you would still lose people... I am on the hunt for a fool proof way.
And we are not alone, after doing some more reasearch, you can find simmiler attempts if you google "capatcha prove your a computer"

Not a Member!

Mikey

Wednesday 1st August 2007 | 07:52 PM

Hey Travis no need to apologise for the multiple comments.

The auto-fill works by reading the field name, so 'lastname' would cause a problem. We can solve this be naming it something completely absurd.

I am composing a new article with our v2.0 implementation, which is intended to satisfy accessibility requirements as well, and I hope to get feedback from people like you on how it can be improved.

Stay tuned.

Not a Member!

Andy

Wednesday 1st August 2007 | 11:55 PM

Sergey - developing forms that require CSS, cookie, or Javascript support is not going to work, because spambots will move inside the browser - everything can be mechanised. You cannot differentiate between people and computers from their feature sets, barring intelligence.

I've already written a bridge between Ruby and gtkmozembed (Gecko) - allowing me to interact with the Javascript engine directly from my Ruby application.

Rodney - I thought about Bayesian too, but comments and e-mail are different problem domains. For example, it's quite common and legitimate for comments to comprise solely of a link - no text body. With e-mail, we expect that the sender has actually written something of substance.

Akismet has the advantage of working with a large corpus of data from many sources.

If you wrote your own bayesian filter then you'd have to impose a few rules like disallowing links unless there are a few sentences of language to accompany them.

But this could be even more inconvenient to posters than putting them through a CAPTCHA test or requiring OpenID login.


Andy.

Not a Member!

Andy

Thursday 2nd August 2007 | 04:59 AM

Sergey - don't worry, I am not in the business of selling spam bots. :-) I didn't quite understand what you meant about parsing text around forms to create a new search engine though?

BTW. Fogot to mention that another big difference between e-mail spam and comment form spam is that e-mail has headers - this goes through the filter with the subject and body - and is incredibly valuable.

Andy.

Not a Member!

James

Thursday 2nd August 2007 | 09:38 AM

It's Nice to see that this work... The way i can tell is because roboform put my last name on the bottom!

Not a Member!

Steve

Friday 3rd August 2007 | 01:39 AM

Nice!

We implemented something similiar at our support boards. Some notes we made about spam bots at our support boards while coming up with a solution that others might find helpful.

Spambots are lazy. they will only load the html. They don't load the graphics that are on the page or supporting css files. I guess this is a bandwidth saving measure.

Spambots don't run javascript. I suspect these bots are just sophisticated regex engines.

Spambots from different IPs do seem to be able to communicate with each other. One will make a post and days later another will try to find that post.

Spambots tend (95% guestimate) to flood their requests as fast as the server can respond. Faster then any human can type.

-------------------------------------

Hope this is helpful.

Not a Member!

Sergey

Saturday 18th August 2007 | 02:30 AM

Spam bots keep getting better and better.
Bots already open sites with IE, they wait before submit.

They not looking to the field names. Bots submit form in all combinations (they try to submit email, name, year to all available inputs, then name, email, year) and message to the textarea field.

I proud to introduce you ActiveSpamProtection. My friend Victor uses flash to protect sites from spammers. And now you can see it on his new site.

It will take too much time for any programmer to write bot that will able to work with flash (popular ActiveX application). Programmers are too lazy. So putting 'Flash Form' on your page you will save your blog for several years.

Also ActiveSpamProtection is so nice looking!
http://dogs.triwe.net/dog-care1.php

No more stupid CAPTCHA
No more additional questions.

For example, on the page I'm showing you:
if you did not fill textarea, the dog will remind you to fill it.

Please let me know (in comments on this blog) what you think about it.

Not a Member!

Mikey

Saturday 18th August 2007 | 09:19 AM

Hi Sergey,

It's good to see others tackling the comment spam issue as well. I have some questions.

What about accessibility? Does it degrade gracefully when flash is not present in the browser? How does a screen reader handle it? I can make a flash form myself quite easily, but I never wanted to for the aforementioned reasons.

That said, most people probably won't care as much as I about that sort of thing.

Not a Member!

renlo

Sunday 19th August 2007 | 02:38 AM

am I the spam?

Not a Member!

Coded Horror

Sunday 19th August 2007 | 10:21 AM

Over on codinghorror.com, they use a captcha that is actaully always the same word, and it's not written in some disguised jumbled font. Presumably, most spam engines don't actually try to decode cpatchas, so you don't need to make it random or difficult to read.

Not a Member!

cam

Sunday 19th August 2007 | 11:12 PM

If you are storing the spam in a DB, you could presumably also store the IP, and serve the spam back to the IP it came from. This would pass the basic "check if submitted" test.

Not a Member!

ILUsion

Monday 20th August 2007 | 03:55 AM

I have a bit of experience writing PHP, working with forum software etc. and we also faced a spam problem. Our solution to this was a little different but equally simple to implement.

We made another input field for everybody to see (so this will work for terminal browsers and visually impaired users too). As description to this field you put a question only a human will be able to answer and you just discard posts that lack an decent answer. For a guestbook without any log-in mechanism this works perfectly. Even better is to rotate the questions, but that isn't necessary as I haven't seen any spam bot passing those checks. My question is "What is the square root of 25?" (which isn't that difficult, considering my public are mostly scientists and engineers). Another possible question is "Our webdomain is ___lime.com ". Every human will be able to pass it, but a bot will certainly fail for the next few years.

If you are using forum/blog software that is part of a large project (phpBB, WordPress, ...) it mostly isn't enough to put up a captcha (the phpBB captcha is terrible, OCR is quite easy to do), but I haven't seen my system fail once yet, and if you make the question very easy no user will complain.

On our forum (phpBB) we used this same system for registrations (so we only needed to annoy our members once). But we also needed to block out spammers from guest-posting (our forum is open to the public, as a lot of the guests just ask one question about our subject and then leave). To do this, the forum rejects all guest postings containing URLs. While this does limit the guests capabilities, we found that of lesser importance than scanning the whole forum every day. And if a human user really needs to post a URL he can edit his post to split the URL so it can be displayed (or one of the admins could take care of that for him).

For everyone using phpBB, this is a great resource: http://bbantispam.com/howto/
Please do note that though it states that you can buy a license, you can use everything for free (and no, I didn't write those scripts, if you like them you can always support the author by buying a license).

Not a Member!

Lynx User

Sunday 26th August 2007 | 07:49 AM

I think that this is a very nifty solution of yours. I like how you even kindly ask for that field to be left blank on browsers that do not pay attention to css scripting. Keep up the origional thinking!

Not a Member!

Richard

Tuesday 28th August 2007 | 04:03 AM

Thats so simple. I'm amazed it wasn't discovered years ago

Not a Member!

Rodney

Sunday 9th September 2007 | 01:45 PM

I run another blog based site elsewhere, which uses a double captcha system (two words).

I am now sadly pretty confident captcha has been beaten, as it's now getting several spam a day, past the captcha.

Plus it that really bloody annoying kind of spam, which uses garbage words like "fjsdhfkjsdf" then "best casino" or something... but with no URL or identifier of any kind... so if they're advertising their crappy casino, how the hell could anyone find it anyway?

Not a Member!

badcam

Wednesday 23rd January 2008 | 06:38 PM

I have just created a blog using wordpress and would love to implement your method. How easily can I implement your suggestion for my site? I can find my way around wordpress enough to install it under my own domain, but coding is not something I'm that good at. Could you please advise in a simple way how I can implement this on my own site? Or, even easier, is there a plugin for this now? Thank you.

I love the site by the way. I first stumbled across it when I was considering buying a Polyview 19" LCD screen. That was a great review. I didn't end up buying one sadly as I needed an urgent replacement.

Not a Member!

Mikey

Wednesday 23rd January 2008 | 06:46 PM

Hi BadCam

No plug-in I'm afraid as this site doesn't use any of the popular open source CMS's out there. We hand coded this baby from scratch. I haven't had a look at wordpress but I am asking some of my trusty compadre's if they can shed any light for you. Stay tuned.

PS: Thanks for the kind comment about this site.

Not a Member!

badcam

Thursday 24th January 2008 | 09:46 PM

Thanks Michael.

I loo forward to hearing more. I also meant to say in my previous comment that just because I didn't buy a polyview, it didn't stop me coming back here to this site. Best of luck for your future.

Not a Member!

Mikey

Sunday 27th January 2008 | 12:05 PM

Hi badcam,

I should point out as I think someone else already has, is this method won't work if someone writes a bot that is designed to specifically beat your site. Since the implementation we have had a 99.9% reduction in comment spam. The only comment spam we get now is on this article only - never on any others. And even then we only get 1 or 2 every second day. Sometimes only a few per week.

Not a Member!

Michael N

Thursday 14th February 2008 | 05:55 AM

If you are not willing to make something have display:none or visibility:hidden, simply put something ontop of the element, or position it absolutely somewhere underneath something using z-index. One can perhaps detect z-index usage and stop it, however something like another form field later in the source code, or a label element, having a negative margin or something like that would be much more difficult to detect - the spam software would not only have to implement the DOM and css reading, but would have to calculate where everything on the page is and then somehow determine if some element appears overtop of it.

Not a Member!

Michael N

Thursday 14th February 2008 | 05:59 AM

before i forget - if one has a captcha image they do not wish to remove, simply modify the source code and apply a few custom changes to it so that the software will either incorrect identify it and pick the wrong captcha image defeating algorithm, or modify it enough so that it no longer works.

using embossed font filters in PHP for example has excellent results, one really needs to use their eyes to read the letters - OCR technology sees many embossed letters in many fonts as incoherent blobs.

Not a Member!

jason

Tuesday 15th April 2008 | 12:14 PM

I know very little html, css, or php. I was asked to create a website for my youthgroup and I am learning as I go. Thank you so much for this.

However my site has a contact form on the home page for feedback. It submits by sending an email, not creating a database (I know nothing of working with databases so its a good thing). After all night of trying to figure this out after reading this page for the first time, here is my brainstorm workaround I figured out on my late night run to Taco Bell.

When someone clicks submit on my form a php file is called to handle the info. I created the hidden field as you said, then in the php file I made a if/else statement were if the field does not equal null the php dies and does not finish. But if the field is empty the php script continues and emails the info. Could someone please proof this since I don't know exactly what I'm doing thanks.

btw "thisbox" is the hidden field to catch spambots


php
$nothing = 'null';
if ($thisbox === $nothing ) {
exit("your message is marked as spam and not sent. please no not enter anything in the thisbox field.");
} else {
$name = $_POST['name'];
$email = $_POST['email'];
$description = $_POST['description'];
$to = "my email address";
$from = "From: $name ";
$subject = "web form";
$message = "Name: $name\nEmail: $email\nBrief Description: $description\n";

mail($to,$subject,$message,$from);
}


so what do you think??? I have tested this by using the form myself and I get the email so I know it will work for actual people visiting, but will it work to stop bots? I guess only time will tell now.

Not a Member!

jason

Tuesday 15th April 2008 | 08:25 PM

Did not work here is a correction that I think will.


php
$thisbox = $_POST['thisbox'];
if ($thisbox NULL) {
exit("your message is marked as spam and not sent. please no not enter anything in the thisbox field.");
} else {

Not a Member!

aaron

Tuesday 15th April 2008 | 11:38 PM

jason:
$_POST['thisbox'] wont be null it will be a zero length string, try-

$thisbox = $_POST['thisbox'];
if ($thisbox != '') {
exit("your message is marked as spam and not sent. please no not enter anything in the thisbox field.");
} else {

Not a Member!

Shawn Hyde

Tuesday 27th May 2008 | 09:55 AM

The company I work for has been using a simular method for several years now along with phone number validator.

validationexpression="^[01]?[- .]?(\([2-9]\d{2}\)|[2-9]\d{2})[- .]?\d{3}[- .]?\d{4}$"

never seen any bot spam come though the request form, ever.

Even better is the method used on 0spam's DNSBL project removal form. So simple but no bots can figure it out. I people don't even know it's there so nothing for real people to worry about.

Not a Member!

Gina Squitieri

Monday 2nd June 2008 | 10:52 AM

Michael, you are so awesome.

I have seen approximately 4-5 porn adverts here in the last two days (like the two above I'm looking at now from "katie"). But they're wiped out within a matter of hours usually.

Kudos. :)

Not a Member!

shad

Tuesday 17th June 2008 | 01:01 PM

Very nice. I had a decent captcha to stop spammers, but the css class is so simply perfect. Nice and clean. Now with filtering tips I'll be set. Thanks!

Comments for this topic are no longer being taken. End of line.

Login to Rusty Lime

Not registered? | Forgot your Password? Cancel Login