Breaking Gmail’s Audio Captcha
A week ago I came across this interesting post at the Websense blog, anyway I guess everybody is already aware that a bot was spotted breaking Gmail's image captcha. According to the post, the success rate is about 20%, which from spammers point of view is really profitable and sure more than enough for its purposes. However what caught my attention, while reviewing the gmail signup page, was the Audio Captcha.
First off, it is worth noting the “cat&dog” Asirra captcha from Microsoft Research, that’s a really good captcha, has kept the success rate of those who tried to break it (computer vision gurus) below of 60%. Why? I think the problem with most of the captchas is that are using a complex solution to show so simple challenges: obfuscated, deformed and distorted image to represent short alphanumeric sequences. On the other hand we have the “cat&dog” style Captchas that implement a simple solution to show a really complex challenge for automated agents: Are you seeing cat or dogs in this perfectly clean picture? A question too hard to answer if you are not human.
The Gmail's Audio Captcha suffers a similar error. It is a wav file embedded within the webpage, once loaded it plays limited series of numbers . Twice. Btw, I don’t understand why that alphanumeric obsession...Anyway, let's begin. In this post I am going to show how that captcha can be broken just by using fourier analysis.
You should play the captcha before continuing Look for this image
within the signup page.
The first obvious error is the use of fixed patterns that clearly identify where the sequence begins and where it ends.
We can listen to the numbers, in background there are distorted voices.Taking into account that human beings are visual entities ( this is the reason because everybody can spot Wally in a crowded place but only trained individuals could distinguish a distorted tone while an orchestra is playing) my question was: “If you are still capable of distinguishing easily the numbers played in the captcha, why an automated agent couldn’t do so?”
So let’s try to find out the answer by taking a look at the waveform of a random Gmail's audio captcha
