China’s AI Now Matches Anthropic Mythos in Cybersecurity | #hacking | #cybersecurity | #infosec | #comptia | #pentest | #ransomware


Two Chinese AI systems are performing on par with Anthropic’s restricted Mythos model in cybersecurity vulnerability detection, according to independent evaluations reported by The Wall Street Journal. Zhipu AI’s open-weight GLM-5.2, released on June 13 under an MIT license, matched or exceeded leading US models on specific security tasks. Separately, 360 Security Technology unveiled an agent-based tool called Tulongfeng at the ISC.AI 2026 conference in Beijing, claiming equivalent capability through an entirely different approach.

Both systems are freely accessible or openly available. Mythos remains restricted to a small number of US-vetted partners after the Commerce Department issued an export control directive on June 12, one day before GLM-5.2 launched. The gap between what the US government tried to contain and what China has made publicly available has effectively collapsed in one narrow but consequential domain.

What the Benchmarks Show

Independent testing by cybersecurity firm Semgrep placed GLM-5.2’s IDOR vulnerability detection at an F1 score of 39%, surpassing Claude Code’s 32 to 37% on the same evaluation, according to Axios. A separate assessment by security analytics firm Graphistry found GLM-5.2 matched Claude Opus 4.8 on a capture-the-flag style security benchmark. Graphistry called it the first open-weight model suitable for what it described as a “frontier-like” cybersecurity experience.

The cost gap is equally significant. GLM-5.2 finds vulnerabilities at roughly $0.17 per bug, approximately one-sixth the cost of comparable Claude-based workflows.

These results come with important caveats. GLM-5.2 still trails Anthropic and OpenAI systems on general-purpose benchmarks. The cybersecurity parity applies to specific vulnerability detection tasks, not broad AI reasoning. Semgrep’s researchers noted their evaluation covered one dataset and one task. But in a field where incremental performance improvements determine whether a vulnerability gets found before or after an attacker exploits it, domain-specific parity carries significant weight.

GLM-5.2 cybersecurity benchmark infographic

Two Paths to the Same Result

What makes this development structurally significant is that two Chinese organizations reached Mythos-level cybersecurity performance through fundamentally different methods.

Zhipu AI built GLM-5.2 as a 753-billion-parameter general-purpose model with strong coding and reasoning capabilities. Its cybersecurity performance appears to be a byproduct of scale and training quality rather than domain-specific optimization. The model is fully open-weight, meaning anyone can download, modify, and run it without API restrictions.

360 Security took the opposite route. Founder Zhou Hongyi told Reuters his company acknowledged a 20 to 30% gap in base model capability compared to US systems. Instead of closing that gap directly, 360 layered AI on top of proprietary vulnerability databases, two decades of security expertise, and automated attack-and-defense pipelines. The resulting tool, Tulongfeng, operates as a swarm of specialized agents rather than a single general-purpose model.

“If the US route is to cultivate a genius hacker, 360’s route is to organise a professional attack-and-defence team,” Zhou said at the Beijing conference.

360 claimed Tulongfeng found 3,432 software vulnerabilities, with 105 confirmed by the Chinese government. Reuters noted those figures could not be independently verified. An April 2026 analysis by Eugenio Benincasa at ETH Zurich’s Center for Security Studies concluded that 360’s AI capabilities were significant but fell short of the autonomous reasoning demonstrated by Mythos. A closer comparison, Benincasa suggested, was Google’s Big Sleep, which speeds up specific phases of vulnerability research rather than operating fully autonomously.

This suggests that cybersecurity AI parity may be structurally easier to achieve than general-purpose AI parity. Vulnerability detection is a bounded domain where specialized architecture and curated data can compensate for weaker foundation models. If that assessment holds, export controls designed to contain general-purpose model capability may not prevent cybersecurity-specific convergence.

China AI cybersecurity parity comparisonChina AI cybersecurity parity comparison

Why US Export Controls Made the Problem Worse

The timing tells its own story. The Commerce Department banned exports of Fable 5 and Mythos 5 on June 12, citing national security concerns over a jailbreak vulnerability. GLM-5.2 launched the following day, freely downloadable by anyone on Earth.

The ban restricted US defenders while a comparable Chinese alternative became globally available. The NSA lost access to Mythos for roughly two weeks during the restriction, even though the agency had been testing the tool and found it effective in security trials.

Saif Khan, a technology fellow at the Institute for Progress who worked on export restrictions under the Biden administration, put it directly in comments to The Wall Street Journal. “Banning Fable while selling chips China needs to develop its own version is a gift to China.”

The contradiction is hard to overlook. The US restricted its own frontier cybersecurity model while continuing to approve chip exports that help Chinese labs train competitive alternatives. Khan argued that the US should be maximizing Mythos deployment to harden domestic cyber defenses while it still holds a broader AI lead, rather than limiting access.

The Disclosure Asymmetry Most Reports Miss

A less discussed but arguably more important factor than benchmark scores is what happens after vulnerabilities are found.

China’s vulnerability disclosure regulations, enacted between 2017 and 2021, require all discovered software flaws to be reported to the Ministry of Industry and Information Technology within 48 hours of discovery, before the affected vendor is notified. This obligation applies to all Chinese companies and researchers, including those using AI-assisted tools, according to Tech Times.

Every confirmed zero-day vulnerability that Tulongfeng finds is legally required to become a Chinese government asset within two days. The government then decides whether it gets disclosed to the software maker, retained for intelligence purposes, or used for offensive operations.

Zhou framed his tools using nuclear deterrence logic, arguing that mutual vulnerability-finding capability prevents conflict. But nuclear deterrence relies on symmetry, where neither side discloses its weapons. Mandatory government-first vulnerability disclosure creates an inherent asymmetry that the deterrence framing does not address. The US model finds bugs and patches them. The Chinese model finds bugs and, by law, reports them to Beijing first.

What to Watch Next

The Five Eyes intelligence alliance warned this month that AI-driven cyber threats could materialize within months, not years. Freely available models matching Mythos-level vulnerability detection make that timeline more concrete.

Several factors will shape what happens from here.

  • Whether the US government restores full Mythos access to defenders or maintains restrictions that weaken domestic cybersecurity teams
  • How quickly additional Chinese open-weight models with cybersecurity specialization follow GLM-5.2
  • Whether the Semgrep and Graphistry benchmark results hold up under broader independent evaluation
  • How Washington reconciles restricting AI model exports while continuing to approve the chip sales that help China build competitive alternatives

The AI cybersecurity race is no longer defined by a comfortable US lead. Both Chinese and US frontier systems can find the same categories of bugs. The question now is who patches those vulnerabilities first and who exploits them first, and the current US policy framework does not have a clear answer for either.

FAQs

Can China’s GLM-5.2 actually match Anthropic Mythos in cybersecurity?

Independent benchmarks by Semgrep and Graphistry show GLM-5.2 matching Mythos-level performance on specific vulnerability detection tasks. The model scored 39% F1 on IDOR detection versus Claude Code’s 32 to 37%. It still trails US models on general-purpose reasoning benchmarks.

What is 360 Security’s Tulongfeng?

Tulongfeng is an AI-powered vulnerability discovery tool built by Chinese cybersecurity firm 360 Security Technology. Rather than relying on a single large model, it uses a multi-agent architecture layered over proprietary security knowledge. The company claims it has found 3,432 software vulnerabilities so far.

How do US AI export controls affect cybersecurity?

The US banned Anthropic’s Mythos and Fable 5 from export on June 12, restricting access for US defenders and allies. One day later, China’s GLM-5.2 launched with matching cybersecurity capability as a free, open-weight download, bypassing export restrictions entirely.

Does Chinese law require AI-discovered vulnerabilities to be reported to the government?

Yes. Regulations enacted between 2017 and 2021 require all newly found software flaws to reach a government agency within 48 hours, before the affected vendor receives notification. The obligation covers any Chinese company or individual, including those using AI-powered discovery tools.

——————————————————-


Click Here For The Original Source.

National Cyber Security

FREE
VIEW