Mar 1, 2006

Spam and the CAPTCHA defence

It is difficult to imagine e-mail without spam these days. The vast majority of the world's electronice mailboxes are haunted by it and an entire industry thrives on it. Companies and ISPs spend billions of dollars to fight it. Not only does it clog the networks due to the excessive traffic, but precious processor cycles and manhours are spent everyday to keep them off our inboxes. It is no wonder that another industry, the ones out to fight spam, is thriving as well. Spam is so much a part of our life that 'checking mail' means clearing up our mailboxes so that we don't lose out on genuine messages. So if everybody hates spam so much, what is being done to stop it?

Stopping Spam

The spam industry works on numbers, large numbers. This means that , say for every 100 e-mails that the spammer sends, his client gets one response. ( A 'response' here refers to a click on a link which either promises you a debt-free life or the woman of your dreams or eternal youth, or whatever.) To get 10,000 responses out of which only 10 may give any returns, he has to send a million emails. It is quite clear that sending that many emails is simply beyond anyone. The solution - automation. A software program, similar to a bulk mailer, is programmed to send out the message to a database of addresses the spammer has harvested from the Web or bought from another spammer. This is where the CAPTCHA defence comes in.

CAPTCHA, which stands for 'Completely Automated Public Turing Test to Tell Computers and Humans Apart' tries to identify if the entity trying to send an email or make a blog post is human or just a robot. It supplies that entity with a graphic, such as the one shown below and challenges it to enter the characters shown in the graphic in a textbox.

If there is a match, the entity is assumed to be a human and the email message goes through or the blog gets published and so on.

CAPTCHA broken

The main problem with CAPTCHA is that it is just a computer program trying to beat another computer program, namely the spambot. CAPTCHA will win as long as the spambot is dumb enough that it can't recognise the characters. But it loses the moment the spambot begins to think like the CAPTCHA program.

How does the spambot 'think' like a CAPTCHA? It is quite simple. Since it knows that there is a valid character sequence in the graphic and it was generated by a computer program albeit distorted and deformed, enough combinations and permutations of the graphic will definitely yield the original sequence. And that is exactly how CAPTCHA is broken. By identifying and learning the distortion patterns of the CAPTCHA program, the spambot is turning the tables around.

A few interesting links:

The CAPTCHA project

Breaking a Visual CAPTCHA

PWNtcha - captcha decoder

No comments: