Skip to Main Content

reCAPTCHA (recently acquired by Google) is a free CAPTCHA service that webmasters are using to prevent comment and other form submission spam, but reCAPTCHA also helps digitize books, newspapers and old time radio shows. Bet you didn’t know that.  I thought it was just there trying to annoy me.  So wrong.

A CAPTCHA is a program designed to tell whether a user is human or a computer. You’ve probably seen them — colorful images with distorted text at the bottom of Web registration forms. CAPTCHAs are used to prevent automated comment and other types of form spam.  No computer program can read distorted text as well as humans can, so bots cannot navigate sites protected by CAPTCHAs.

Stop Spam. Read Books.In fact, computers do such a poor job of reading distorted text that even computers designed to read distorted (scanned) words and digitize them can’t do it very well.  Known in the ‘biz as “OCR” these complex algorithms are used to read scanned content and transfer them into digital text.  OCR has come a long way since it first came out in the early 90’s, but it still has a hard time with very small fonts, poor contrast (such as a scanned newspaper) or hand-written content.

So, the idea behind reCAPTCHA is pretty simple.  Create a system which asks a user to read two distorted words and enter them in a verification field.  That means no more automated form spam, but it also means that users all over the world are actually helping to digitize old scanned texts into digital format.  It’s a win-win.

Here’s how reCAPTCHA explains it:

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

Ok, but if they can’t read the distorted text, how do they know you’re right.  Here’s how it works (again, right from reCAPTCHA’s site):

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Currently, we are helping to digitize old editions of the New York Times and books from Google Books (because Google owns reCAPTCHA).

So if your site is suffering from problems with spam, consider rolling out reCAPTCHA on your site.  For the big CMS platforms (like WordPress and Joomla), there are pre-built, free plugins you can use.

Stop spam and read more books with reCAPTCHA.  It’s effective.  It’s makes you feel warm and fuzzy. It’s a win-win.

UPDATE: “noCAPTCHA”

Google has recently announced a new reCAPTCHA. Because so many people had such a hard time actually reading the CAPTCHA, the new system simple asks you if you’re a human or not.

Google reCaptcha Demo

They’re calling it the No CAPTCHA reCAPTCHA. Sound weird?  See for yourself.

Share the love:

Discussion

Comments are now closed.

Get the Email

Join 1000+ other subscribers. Only 1 digest email per month. We'll never share your address. Unsubscribe anytime. It won't hurt our feelings (much).

Preview Email