Small Business Technology Blog

Tuesday, April 6, 2010

"Retype the word" or more properly CAPTCHA tests are designed to prove you're not a computer - and that difficult to read text is part of the test.

While it might feel like a "gotcha", they're actually called a CAPTCHA, which is an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart".

Yep, it's a "prove you're human" test.

And all that twisty, blurry, faded stuff you're complaining about? That's actually kinda the point.

That's the test.

The problem that CAPTCHA's avoid is actually an important one: preventing computers from doing things in an automated fashion - like creating millions of fake email accounts
or posting comment spam to websites. By forcing this test which computers cannot (yet) pass, the activity that is being protected can be performed only by a real, live person.

The limitation that these tests take advantage of is that computers can't read.

Now, technically that's incorrect - optical character recognition
has come a long way. Computer OCR software can, with a very high degree of reliability, take a photograph or scan of text printed on a page and "read" it - turn it into the computer representation of the text that the page contains, as opposed to a picture of that text.

That's actually pretty cool, and very handy for many applications.

However, there are limits. Even with clear copies of the text a computer has a difficult time with some characters (the letter 'l' versus the number '1' in many typefaces, for example), and thus can still get things wrong.

When things get blurry, twisted or faded, current computer algorithms try and figure out what those characters are and fail miserably. It just can't figure out what those characters are.

You and I, on the other hand, can.

Usually.

So when we get the answer correct where a computer couldn't possibly it "proves" we're human.

For now.

As computer technology advances, techniques will I'm sure be developed that will allow the computer to correctly interpret today's CAPTCHA's. What happens then I don't know.

A couple of random notes on CAPTCHA's:

One way that they're often defeated is to hire real live humans - often cheaply, overseas.

Another way that some are bypassed is by exploiting weaknesses in a particular implementation. For example, if one type of CAPTCHA always selects from one of 100 different scrambled words, then one need only have a real human interpret each one once, and then simply let the computer compare pictures - something it is good at.

My favorite CAPTCHA, when I use one, is reCAPTCHA, which presents two words in random order: one of which is a real test, the other is a word that is part of a book digitization project. (Their about page has not only a good overview of CAPTCHA, but also how they're using it in reCAPTCHA.)

CAPTCHAs can have problems - specifically for people with poor or no eyesight. In most cases, an audible CAPTCHA equivalent is made available where you type in what you hear spoken.
Even in normal cases, as you're seeing, sometimes CAPTCHAs are too hard, too blurry, or too unreadable even for humans. Fortunately, most also include some kind of "show me something else" alternative.

But unfortunately, the bottom line is that the blurriness, and the difficulty is indeed the point.


And CAPTCHAs or something much like them will be around for quite some time - probably as long as there are spammers and those who would do other malicious things en masse, given the opportunity to automate the process.

No comments:

Post a Comment