The myth of Claude Mythos crumbles as small open models hunt the same cybersecurity bugs Anthropic showcased | #hacking | #cybersecurity | #infosec | #comptia | #pentest | #ransomware


Anthropic has kept its Claude Mythos cybersecurity model on a short leash, pointing to capabilities it says no rival can match. But two new studies suggest that even small, openly available models can reproduce most of the vulnerability analyses Anthropic has put on display.

Through Project Glasswing, Anthropic has limited access to Claude Mythos Preview to a consortium of eleven organizations, citing the model’s offensive capabilities. Internal tests and an audit by the UK’s AI Security Institute found that Mythos can find software bugs, build working exploits on its own, and take over entire corporate networks in simulations, as long as the network is “small, weakly defended and vulnerable.”

Two independent replication efforts are now poking holes in that exclusivity story, without disputing the model’s overall performance.

The first comes from AISLE, a company that has been running its own AI-assisted bug hunting on open source software since mid-2025. AISLE says it has reported 15 vulnerabilities in OpenSSL and five in curl. Founder Stanislav Fort fed the code snippets from Anthropic’s public samples into a range of models to see how much smaller and partially open models could piece together on their own. The second study comes from Vidoc Security, which paired GPT-5.4 and Claude Opus 4.6 with the open coding agent OpenCode.

Small models catch the FreeBSD bug too

The FreeBSD NFS bug (CVE-2026-4747) that Anthropic spotlighted was pitched as a showcase for autonomous discovery and exploitation by Mythos. AISLE found that all eight models it tested caught the memory bug in the function in question. That included GPT-OSS-20b, a model with just 3.6 billion active parameters that runs at $0.11 per million tokens. Every model flagged the flaw as critical, though their estimates of the overwritable buffer size varied slightly.

Every model also came up with a plausible take on how to exploit the bug, working out why the operating system’s main protections don’t apply here. GPT-OSS-120b produced a gadget sequence that AISLE says comes close to the real exploit. Kimi K2 even figured out on its own that the attack could spread automatically from one infected machine to others, a detail Anthropic itself doesn’t mention.

Where things get harder is on the creative side. The real exploit has to squeeze a payload of more than 1,000 bytes into about 304 bytes of available space. Mythos pulled it off by splitting the payload across 15 separate network requests. None of the tested models landed on that exact trick, but they found other workable paths, the researchers say.

A jagged capability landscape

The OpenBSD bug is a different story. It calls for a mathematical grasp of integer overflows and list states, and results are all over the map. AISLE says GPT-OSS-120b reconstructed the full publicly described exploit chain in a single run and essentially proposed the actual OpenBSD patch as the fix.

Qwen3 32B, which had held its own on the FreeBSD bug, declared the OpenBSD code “robust to such scenarios.” Vidoc ran into a similar wall: Claude Opus 4.6 reproduced the vulnerability in three out of three runs, while GPT-5.4 missed it every time. Fort calls this “the jagged frontier,” a broken, uneven capability boundary. There’s no single best model for cybersecurity, and the rankings shift sharply from one task to the next.

When small models beat the big ones

One of the more revealing tests uses a simple code sample that looks like a textbook security hole at first glance. User input seems to flow unfiltered into a database query. But a few lines down, that input is actually discarded, so the vulnerability isn’t real.

Of the 13 Anthropic models tested, Opus 4.6 clearly got it right, while Sonnet 4.6 and Opus 4.5 landed as borderline correct. The full table marks Opus 4 as partially correct and Opus 4.1 as borderline. Claude Sonnet 4.5 confidently traced the data flow the wrong way.

On the OpenAI side, o3 was consistently correct, o4-mini only partially, and GPT-OSS-20b is listed as correct. All GPT-4.1 models and most GPT-5.4 models came up short. Other small open models like Deepseek R1 and Kimi K2 nailed it every time.

What happens when the fix is already in

Fort added an important caveat later. While every model reliably flagged the unpatched FreeBSD code as vulnerable, only GPT-OSS-120b—and, to a limited extent, Qwen3-32B—recognized the patched version as safe.

GPT-OSS-20b, Kimi K2, and Deepseek R1 got it wrong in every run and invented reasons why phantom vulnerabilities still existed. Fort doesn’t see this as a hit to his argument. If anything, he says, it confirms that the testing and sorting layer around the model is the critical piece.

The real edge is in the full system

Vidoc also tested cases beyond classic memory bugs. The Botan case involves a flaw in certificate validation that let a forged certificate pass as trusted. Both Claude Opus 4.6 and GPT-5.4 caught the logic gap in three out of three runs. For wolfSSL, tested in parallel, both models zeroed in on the right part of the code but misread the underlying cryptographic rule. The cost per scanned file came in under $30.

Both studies argue that the real advantage lies less in any single model than in the system built around it; validation, prioritization, and workflow. That covers the full pipeline: picking targets in the code, running step-by-step analysis, checking the results, and separating real hits from false ones.

AISLE goes further, arguing that small, cheap models are good enough for most of the discovery work, which makes broad scanning a viable strategy. “A thousand adequate detectives searching everywhere will find more bugs than one brilliant detective who has to guess where to look,” Fort writes.

Both reports leave open the possibility that Mythos still has an edge in building deployable exploits but suggest that gap will likely close as tools improve and models gain more autonomy. Together, they point to a line between frontier and publicly available models that’s far more porous than Anthropic’s messaging lets on, at least when it comes to finding vulnerabilities.

Critics have accused Anthropic of fearmongering, arguing the company is drumming up media attention until it has the compute to open Mythos up to a broader audience. There may be something to that. According to the Financial Times, which cites “multiple people with knowledge of the matter,” Anthropic is holding the model back until it has enough compute capacity to serve customers.

——————————————————-


Click Here For The Original Source.

National Cyber Security

FREE
VIEW