bad, worse, worst, CAPTCHA, reCAPTCHA

Think what you will of Google products. I personally try to avoid them for privacy reasons.

But there’s that one Google product that is at the same time the crappiest and most ubiquitous Google product ever conceived. It’s called reCAPTCHA. The stated intent of CAPTCHAs, and allegedly also reCAPTCHA, is to tell apart human from bot. Any needy third-party website not afraid of the internet search giant (aka privacy black hole) and lacking own creativity, expertise or time to come up with alternatives that work for real humans, will slap some reCAPTCHA crap onto their website … of course embedded in an iframe, because those are oh-so-modern (like 1990s-modern or something funky like that).

How does it tell humans apart from bots? Well, in the past you were told to read some garbled up text and were allegedly helping Google with OCR of some books they had scanned into digital form. But these days reCAPTCHA is all about figuring out mountains, rivers and lakes, buildings, store fronts, street signs or similar stuff from photos.

Alas, all of the stuff these “geniuses” at Google ask about is culture-specific. I am supposed to tell apart a store front, from a house front where glyphs are plastered on the house front which I can’t read, let alone understand.

Suddenly towers and churches are no buildings according to Google. How am I to tell a lake from a river if you show me just a single shoreline? Oh, and of course they won’t tell you if you failed. As a human you may just as well stop after trying to solve the fifth reCAPTCHA dialog, or check the audio version to receive the confirmation that you have been (wrongly, but very confidently,) recognized as a bot.

Wow. Just wow. It’s running the gauntlet with that piece of bovine feces. It fails at the single thing it’s meant to do, which is to tell computers and humans apart.

I have wasted so much lifetime with this crap, so I hope some Google folks run across this some time in the future or perhaps some of said third-party website owners looking for something better than the ridiculously stupid reCAPTCHA method. But I won’t hold my breath. Especially given that Google dropped their original motto “don’t be evil” and wasting other humans’ lifetime clearly has an ethical component to it.

That said, here’s a method I’ve been using successfully for quite some time for sign-up and sign-in forms. You need some piece of data from the client, the IP for example will do. You then also need some salt value. Mix and match as needed, and be creative. Just be aware that if you use the current time/date with this method, you – well, actually your users – may run into issues around midnight.

Now the use the name and ID of a form field, respectively, your selected piece of data from the client along with your salt and anything you deem reasonable is sent through a cryptographic hash function. Now prepend something like z_ to the hash. This ensures that the HTML element ID is valid. The more form fields you treat with this, the better. If you can treat other elements on the website as well, it will make it nearly impossible to determine the name/ID for a form field without rendering the page first and effectively “looking” at that (which is harder to do for bots).

Now when the user submits, the receiving script will know the names/IDs of the form fields it is looking for. It also has the same circumstantial information about the client (e.g. IP) and it knows the details about the salt. So it can determine the field names to look for a particular value. You can even obfuscate parts of the URL used for submit, using this method.

This, along with easy puzzles like “what is three times two as an integer?” will go a long way in preventing the most obnoxious automated and human spam. And yet, it’s solvable independent of the culture you hail from.

I wish upon every Google engineer having “contributed” to reCAPTCHA to have as many boring and futile tasks in their daily routines as possible for the rest of their lives. Just as a payback for all the human lifetime they wasted worldwide and are wasting as of the time of this writing.

// Oliver

This entry was posted in EN, Opinion, Thoughts and tagged , . Bookmark the permalink.

6 Responses to bad, worse, worst, CAPTCHA, reCAPTCHA

  1. Christian says:

    Well, in the early days of captchas, captchas were also intended to decrypt texts in books that could not be completely read by machines. However, these days are over and nearly all books can be read by machines more accurately than humans can do. So nowadays, the old captchas are just for making humans angry :D

    I must say that I really like the ‘I’m not a robot’ checkbox captchas ( . Just browse the page and usually a click on the checkbox is all you need. Of course, google collects all your actions to find out if you are human. Well, it does this anyways. Still, sometimes I must do these puzzles, and of course, you never find out if it was correct, or why you had to do two or four of them. At least I don’t have to give a blood sample….yet.

  2. Oliver says:

    I always thought the OCR was more of a useful side-effect rather than the primary purpose. But I’m aware of that. Although the logic behind it seemed flawed. Because in many cases I was convinced to have picked the right “translation” and yet failed the test.

    As for these I’m not a robot CAPTCHAs, that’s exactly the kind I mean. If you are concerned about your privacy, after clicking to check that checkbox you’re being subjected to a whole bunch of these image CAPTCHAs. Perhaps if you use Google’s other services and have whitelisted their servers anyway, you won’t notice. But I am avoiding Google services and reCAPTCHA is about the only thing I ever get to see from them. But it also means whenever I see the I’m not a robot CAPTCHAs, I know I have to solve some. Sadly you often don’t even see them and if the third party website didn’t include some extra hint, you’ll simply see a login form without a hint that some CAPTCHA needs solving. Because reCAPTCHA is embedded as iframe.

    And what’s worse, it’s the one thing that you usually can’t avoid since it blocks the access not just to some Google services, but to the third party service I was trying to access when reCAPTCHA interfered.

  3. John says:

    I hate re-Capture with a vengeance. I’ve taken to using the sound version if offered the chance where it says numbers in a freaky foreign way. You know when it ends because there are 10 numbers. But that sucks too. If it fails to provide a number, and you refresh, it thinks you are a bot and sends you to its help on a web site. Everything google touched it f***s up. youtube is a prime example.

    I urge website makers, if you want to prevent bots and things use something other than re-capture, and avoid anything google. It sucks.

  4. Paul says:

    Today when trying to buy a bundle of games from the Humble site I had more than ten pages of “click the boxes that contain traffic signs” and the alternative that asks you to click the ones containing cars. Whilst I accept that I could have got some wrong I do not accept that I could have got more than ten wrong. These things are driving me mad, I thought I had mastered them usually only getting one or two recently, until today.
    My main issue is that I have in the past given up and taken my hard earned elsewhere to spend so doesn’t that mean they do not work, are not fit for purpose.
    Apart from the usual “are the signposts a part of the sign” question I don’t see any difficulty in being correct but I suspect that is not the only criteria you need to fill. Are they assessing the way the cursor moves or something like that.

  5. Oliver says:

    Indeed the movement of the mouse cursor is supposedly one of the inputs they use for their assessment. Since they’re routinely wrong, however, I think we can confidently state that Google fails at the one purpose a CAPTCHA has to fulfill: telling apart human from bot automatically.

  6. Guy Gardner says:

    The latest vanishing images click roads and cars is most horrible. I never get a capitcha that does not involve dozen skips, then 2-3 sets of vanishing images then more skips despite a verify image exercise. The click images usually have very poor quality unidentifiable crap. Its punishment. Its slow to load and at times hangs up and forces refresh. Just 1 Verify exercise with 3 image clicks is enough to do why need more crap?

Leave a Reply

Your email address will not be published. Required fields are marked *