[ad_1]
Rival AI labs OpenAI and Anthropic have put each other’s security systems to the test in a rare show of collaboration. The goal: to identify blind spots in their own security processes and set a new standard for cooperation on AI safety. OpenAI evaluated Anthropic’s Claude Opus 4 and Sonnet 4 models, while Anthropic ran tests on OpenAI’s GPT-4o, GPT-4.1, o3, and o4-mini models.
The results were mixed. Anthropic’s report found that OpenAI’s specialized “reasoning” model, o3, was better aligned with safety goals than Anthropic’s own models. But OpenAI’s general-purpose models, GPT-4o and GPT-4.1, were more vulnerable in simulated abuse tests. These models cooperated with requests to plan terrorist attacks, design bioweapons, and synthesize drugs with little resistance.
OpenAI’s analysis highlighted different weaknesses in Anthropic’s Claude models. Claude was especially good at following complex instructions, but it struggled in hallucination tests: to avoid making false statements, the models refused to answer up to 70 percent of the time, which sharply limited their usefulness. Claude was also more prone to certain jailbreak attacks than OpenAI’s own models.
Both companies stress that these are artificial stress tests and don’t necessarily reflect how the models behave in real-world use. Anthropic admitted its own testing setup might have disadvantaged OpenAI’s models, especially in tasks involving external tools.
Ad
AI models already being used for cybercrime
While these labs probe for risks in controlled settings, Anthropic’s latest report shows that AI is already enabling cybercrime in the wild. The company describes several examples of its Claude model being misused.
In one case, a criminal used Claude code as an autonomous agent for data theft and extortion – what Anthropic calls “vibe hacking.” The AI made both tactical and strategic decisions, like choosing which data to steal and setting ransom demands. In another case, North Korean actors used Claude to fraudulently get remote jobs at US tech firms. A third example describes a low-tech criminal using Claude to build ransomware and sell it as a service.
Anthropic’s conclusion: agentic AI is lowering the bar for complex cybercrime, and criminals are already using these tools at every stage of their operations.
Anthropic builds deeper ties with US national security
Alongside its security disclosures, Anthropic announced the formation of a National Security and Public Sector Advisory Council. The new council will advise Anthropic on strategies to help the US government and allied democracies maintain a technological edge in an era of global competition.
The bipartisan council includes high-profile former officials like ex-senators Roy Blunt and Jon Tester, former CIA Deputy Director David S. Cohen, and former Acting Secretary of Defense Patrick M. Shanahan.
Recommendation
This move formalizes Anthropic’s growing partnership with the US public sector. The company already has a $200 million deal with the Department of Defense to develop specialized AI models for government use and works with national nuclear labs.
OpenAI and Anthropic put each other’s AI models through security stress tests as part of an unprecedented collaboration, while Anthropic reports on real-world AI misuse and deepens its relationship with US national security agencies. | Image: Anthropic/OpenAI
[ad_2]
Source link
Click Here For The Original Source.