“I am a robot” replied the researcher’s code as he bypassed the tool that turns us into image tagging machines for Google
One of the oldest systems on the market in the field of preventing bots on websites is Google’s reCaptcha, which manages to prevent bots from visiting websites, harvesting information, and impairing their function. But an Israeli researcher shows how this system can be deceived – when all it takes is the tools of Google itself.
“I’m a robot,” and I bypassed you
Yair Mizrahi, an Israeli security researcher, tells us that already in reCAPTCHA v2 he managed to bypass Google’s tool with – wait for it – a robot (kind of). This is a weakness that the researcher found back in 2016, which was partially corrected by the technology giant – but since then Mizrahi has managed to exploit it again in what he called Re-ReBreakCaptcha.
We are familiar with the image recognition challenge, in which you and the bots have to point to a fire hydrant, traffic lights, or trucks. Indeed, Mizrahi agreed that cracking this challenge would be complicated. But then he decided to challenge Google’s system by using an option that most of us are not familiar with – the voice challenge, an option that is usually intended for the visually impaired for example. To get to the voice challenge all you have to do is click on the headphone icon, and from there the system reads four digits very clearly.
Mizrahi began to test the method manually. He was able to download the sound file generated by the system, and convert it to an MP3 with 64 kilobytes bitrate to a WAV file (which was converted with a 16-bit PCM codec) – this format was important so that he could process the file later. So the researcher used Google’s own voice recognition API, which immediately cracked the voice challenge for him. Now that he sees that his code can already do this manually – the way is paved to move to automation.
Mizrahi moved on to the POC phase and wrote code in Python, which performs 100 attempts at a time to crack reCaptcha. He says that between each of these steps, the code waits a random number of seconds, so as not to arouse the system’s suspicion that it’s a bot. “In the new version, I came across a more sophisticated system that detects bots, and I bypassed it by sleeping in a code (simple sleep) for a few minutes any number of iterations,” Mizrahi said in a conversation with us. According to him, at first, the system would block him after 20-25 attempts – and therefore understood the need for methodical stops to prevent his identification and blocking by reCaptcha.
The code uses Selenium, which allows the browser to be automatically controlled via its WebDriver – GeckoDriver in the case of Firefox. The code uses the identifiers and text of the various elements on the page to know what to click on. In addition, Mizrahi uses the pydub directory to perform the conversion of the voice challenge to a format that Google will accept.
What are the actual risks of this weakness in your opinion? Where can it harm the end-user if at all?
“There is no risk to the end-user, and it seems to me that he will be happy not to recognize images. On the other hand, publishers can be harmed if they try to target them with brute force attacks on users’ passwords or scraping content on the site.”
Google is not excited
As mentioned, Mizrahi found the original weakness back in 2016 and even then turned to the technology giant with its Agra. But Google was in no hurry to correct it, saying that after examining the issue with the reCaptcha team they are “aware of the limitations of the voice challenges” and are exploring ways to address them – but have not revealed if and how they will do so. In practice, Mizrahi says that there has been a slight change in the sound test interface: Google eliminated the background noise it placed throughout the recording, adding noise at the beginning and end of the audio segment, and also changed the read text from numbers to phrases.
Now, having relatively easily bypassed Google’s new system as well, he made a toll again – and again this time Google did not seem excited: “We examined the (weakness) you submitted and decided not to follow it as a security bug, because we are already aware of this weakness,” Which was sent to Mizrahi and which also reached us.