Think what you will of Google products. I personally try to avoid them for privacy reasons.
But there’s that one Google product that is at the same time the crappiest and most ubiquitous Google product ever conceived. It’s called reCAPTCHA. The stated intent of CAPTCHAs, and allegedly also reCAPTCHA, is to tell apart human from bot. Any needy third-party website not afraid of the internet search giant (aka privacy black hole) and lacking own creativity, expertise or time to come up with alternatives that work for real humans, will slap some reCAPTCHA crap onto their website … of course embedded in an
iframe, because those are oh-so-modern (like 1990s-modern or something funky like that).
How does it tell humans apart from bots? Well, in the past you were told to read some garbled up text and were allegedly helping Google with OCR of some books they had scanned into digital form. But these days reCAPTCHA is all about figuring out mountains, rivers and lakes, buildings, store fronts, street signs or similar stuff from photos.
Alas, all of the stuff these “geniuses” at Google ask about is culture-specific. I am supposed to tell apart a store front, from a house front where glyphs are plastered on the house front which I can’t read, let alone understand.
Suddenly towers and churches are no buildings according to Google. How am I to tell a lake from a river if you show me just a single shoreline? Oh, and of course they won’t tell you if you failed. As a human you may just as well stop after trying to solve the fifth reCAPTCHA dialog, or check the audio version to receive the confirmation that you have been (wrongly, but very confidently,) recognized as a bot.
Wow. Just wow. It’s running the gauntlet with that piece of bovine feces. It fails at the single thing it’s meant to do, which is to tell computers and humans apart.
I have wasted so much lifetime with this crap, so I hope some Google folks run across this some time in the future or perhaps some of said third-party website owners looking for something better than the ridiculously stupid reCAPTCHA method. But I won’t hold my breath. Especially given that Google dropped their original motto “don’t be evil” and wasting other humans’ lifetime clearly has an ethical component to it.
That said, here’s a method I’ve been using successfully for quite some time for sign-up and sign-in forms. You need some piece of data from the client, the IP for example will do. You then also need some salt value. Mix and match as needed, and be creative. Just be aware that if you use the current time/date with this method, you – well, actually your users – may run into issues around midnight.
Now the use the name and ID of a form field, respectively, your selected piece of data from the client along with your salt and anything you deem reasonable is sent through a cryptographic hash function. Now prepend something like
z_ to the hash. This ensures that the HTML element ID is valid. The more form fields you treat with this, the better. If you can treat other elements on the website as well, it will make it nearly impossible to determine the name/ID for a form field without rendering the page first and effectively “looking” at that (which is harder to do for bots).
Now when the user submits, the receiving script will know the names/IDs of the form fields it is looking for. It also has the same circumstantial information about the client (e.g. IP) and it knows the details about the salt. So it can determine the field names to look for a particular value. You can even obfuscate parts of the URL used for submit, using this method.
This, along with easy puzzles like “what is three times two as an integer?” will go a long way in preventing the most obnoxious automated and human spam. And yet, it’s solvable independent of the culture you hail from.
I wish upon every Google engineer having “contributed” to reCAPTCHA to have as many boring and futile tasks in their daily routines as possible for the rest of their lives. Just as a payback for all the human lifetime they wasted worldwide and are wasting as of the time of this writing.