Off-the-shelf AI models are getting good at hacking, too | #hacking | #cybersecurity | #infosec | #comptia | #pentest | #hacker

Anthropic’s “superhacker” Mythos might not be in public hands, but a new study from Forescout has revealed that widely available, off-the-shelf LLMs could be just as dangerous.

Despite everyone from UK bankers to the White House sounding the alarm over Anthropic’s latest model following the AI firm’s splashy announcement of its capabilities, Forescout’s testing revealed other frontier AIs made a quiet leap in their evolution, and are already good enough at vulnerability exploits to pose a real threat.

Last summer, the cyber firm tested fifty AIs, including commercial, open source, and “underground” models – those developed or trained outside mainstream institutions – and found more than half (55%) failed basic vulnerability research, and almost all (93%) couldn’t complete any exploit development tasks.

Nine months on, every model Forescout tested can handle vulnerability research, with half capable of producing functional exploits autonomously.

Though some LLMs performed poorly across the board – particularly, Deepseek-R1‑Qwen‑32B and Qwen2.5‑72B‑Instruct, neither of which could identify vulnerabilities or generate a working exploit – others proved adept hackers.

Both Gemini 2.5 Pro Experimental and Gemini 3 Pro Preview were able to uncover some vulnerabilities and generate bugs to exploit them, though not every time, much the same as OpenAI’s ChatGPT o3-mini-high, while GPT 5.3-codex was good at finding flaws but not so hot at prying them open.

However, the most capable models, including Claude Opus 4.6 and Moonshot AI’s open-source Kimi K2.5, were able to find and exploit vulnerabilities “without complex prompts” from operators.

Forescout claimed that using single prompts coupled with the open-source RAPTOR agentic framework for cyber evaluation and the firm’s own extensions, researchers discovered four new zero-day vulnerabilities in OpenNDS, the portal system used to control access on public or semi‑public Wi‑Fi networks.

Worse, while some of these tools are resource-heavy – for instance, Claude Opus 4.6 required up to $25 per million output tokens to crack Forescout’s cyber benchmarks – others are far more accessible.

Moonshot AI’s Kimi K2.5’s top paid tier is only $159 a month, though the model is open-weight and open-source, meaning users could have it for nothing with the right hardware, an attractive prospect for opportunistic, low-skilled threat actors, who are apparently now forming communities around these tools.

“Previously, underground forums featured advertisements for poorly performing underground AI models. Now, threat actors are more often sharing jailbreaks and adopting commercial or open-source models,” said Forescout.

“In the past, we observed scepticism. Now we see experienced members coaching newcomers on how to use these tools.”

Off-the-shelf AI models are getting good at hacking, too | #hacking | #cybersecurity | #infosec | #comptia | #pentest | #hacker

Recommended reading

Related

Related

Our Products

Company

Other Links