Click any tag below to further narrow down your results
Links
Researchers at HiddenLayer found a flaw in the guardrails of popular AI models like GPT-5.1 and Claude. The EchoGram attack uses specific words to trick these safety systems, allowing harmful requests to bypass defenses or causing harmless requests to be flagged as dangerous.
The article discusses a benchmark report that highlights how Anthropic's Claude models excel in security compared to other large language models (LLMs). While most models struggle with vulnerabilities like jailbreaks and harmful content generation, Claude consistently demonstrates superior performance, indicating a significant gap in safety standards across the industry.
This article discusses a method for identifying software vulnerabilities by integrating large language models (LLMs) with static analysis tools like CodeQL. The authors highlight their tool, Vulnhalla, which filters out false positives and focuses on genuine security issues, illustrating the challenges of using LLMs in vulnerability research.
The article discusses experiments using Opus 4.5 and GPT-5.2 to generate exploits for a zero-day vulnerability in QuickJS. It concludes that the future of offensive cybersecurity may rely on token throughput rather than the number of human hackers, as LLMs prove effective in exploit development.
Current approaches to securing large language models (LLMs) from malicious inputs remain inadequate, highlighting significant vulnerabilities in their design and deployment. The article discusses the ongoing challenges and the need for improved strategies to mitigate risks associated with harmful prompts.
The article discusses the security vulnerabilities of local large language models (LLMs), particularly gpt-oss-20b, which are more easily tricked by attackers compared to larger frontier models. It details two types of attacks: one that plants hidden backdoors disguised as harmless features, and another that executes malicious code during the coding process by exploiting cognitive overload. The research highlights the significant risks of using local LLMs in coding environments.