3 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This project provides a command-line tool in Python that uses AI models from OpenAI and Google to solve various CAPTCHA types. It automates browser interactions with Selenium and captures solutions, recording successful attempts as GIFs. Users can easily customize it for different CAPTCHA challenges and AI providers.
If you do, here's more
The project is a Python command-line tool designed to solve various CAPTCHA types using large multimodal models, specifically OpenAI's GPT-4o and Google's Gemini. It employs Selenium for web browser automation, allowing it to interact with web pages and tackle CAPTCHA challenges in real-time. Successful attempts are saved as GIFs in the `successful_solves` directory, providing a visual record of the toolβs effectiveness.
The tool supports several CAPTCHA formats, including standard text CAPTCHAs, distorted text, reCAPTCHA v2, puzzle challenges, and audio CAPTCHAs. Users can easily set it up by cloning the repository, installing dependencies, and configuring their API keys in a `.env` file. Running the solver involves specifying the CAPTCHA type and preferred AI provider through command-line arguments. The script initiates a Firefox browser instance, navigates to the demo page for the selected CAPTCHA, captures images or audio, and sends them to the chosen AI model for analysis.
Key files in the project include `main.py`, which serves as the entry point, and `ai_utils.py`, which handles interactions with the AI APIs. The modular design allows for easy expansion, enabling users to add support for new CAPTCHA types or AI models. There's also a benchmarking script to test the performance of different solvers, helping users understand which models work best under various conditions.
Questions about this article
No questions yet.