Anthropic Built an AI That Can Hack Entire Systems Autonomously — Then Locked It Away. That May Not Be Enough.

Anthropic’s Mythos Can Exploit Critical Vulnerabilities Autonomously. Withholding It Raises More Questions Than It Answers.

Anthropic has developed an AI model capable of autonomously identifying, exploiting, and chaining together software vulnerabilities to take control of entire digital systems — and then made the unusual decision not to release it. The model, called Mythos, represents a genuine capability threshold in AI-assisted cyberattack. But within days of its restricted rollout, reports emerged that it had already been accessed outside controlled settings. The question is no longer just whether to ship a tool like this. It’s whether containment is even possible.

According to Anthropic, Mythos can discover previously unknown software vulnerabilities — so-called zero-days — exploit them, and chain multiple exploits together to compromise operating systems, web browsers, and core digital infrastructure. It does this autonomously: writing code, escalating privileges, and adapting its approach step by step, without a human directing each move.

The UK’s AI Security Institute independently evaluated the model and confirmed the findings. Mythos solved a significant proportion of expert-level cybersecurity challenges and demonstrated the ability to carry out multi-step attacks — not merely flagging a flaw, but planning and moving through a system the way a skilled human hacker would, only faster.

From Bug-Finder to System Compromiser

This didn’t emerge from nowhere. Just months ago, Anthropic’s earlier models were far better at finding vulnerabilities than exploiting them. In internal tests, one predecessor model could barely convert identified flaws into working attacks. Mythos appears to have crossed that line decisively: in one benchmark, it generated functioning exploits hundreds of times in scenarios where earlier models almost never succeeded.

In a simulated corporate network attack, Mythos completed the full chain of actions needed to take over the system — a task that would take a skilled human security team many hours.

The analogy circulating in security circles is apt: a burglar who can enter any building, locate every hidden weakness, unlock every door, and empty every safe — without being told where to look.

Restricted Access, Immediate Breach

Rather than releasing Mythos publicly, Anthropic launched Project Glasswing, granting limited access to approximately 40 organizations — predominantly American technology companies — with the stated goal of using the model defensively: finding vulnerabilities and patching them before malicious actors can exploit them. The UK government also received access for independent testing.

The response from financial institutions and regulators was not curiosity. It was concern. Banks began paying attention. Governments convened meetings.

Then, almost immediately, reports surfaced that Mythos had been accessed by a small group of users through a private online forum — despite never being publicly released. It was not a mass leak. But it was enough to expose the fragility of the containment strategy.

Containment has become the AI industry’s default safety mechanism: don’t release the model, limit access, work with trusted partners, stay ahead of misuse. It sounds reasonable. It also assumes a level of control that may not hold.

The Asymmetry Problem

Mythos does not introduce a new category of risk. What it does is change the scale of an existing one. Serious vulnerability discovery has always required skill, time, and patience — a natural bottleneck that limits who can mount sophisticated cyberattacks. Mythos erodes that bottleneck.

It makes vulnerability discovery faster, cheaper, and more systematic. That matters because cybersecurity has always been structurally asymmetric: defenders must protect everything; attackers only need one way in. Lower the cost of finding that entry point, and the balance shifts — potentially dramatically.

There is a defensive upside. Mozilla tested Mythos against Firefox and found significantly more vulnerabilities than previous methods had detected, then fixed them. The same capability that can be weaponized can also harden systems. Anthropic has leaned into this dual-use framing, and it is not wrong — but it is incomplete.

The transition period matters. Between now and a hypothetical future where AI-accelerated defense dominates, there is a window in which offensive capabilities are improving faster than the governance systems designed to manage them.

A Structural Problem, Not a One-Off

There are already indications that similar systems are being built elsewhere. Chinese firms are reportedly developing their own “vulnerability discovery agents.” Researchers have noted that smaller, cheaper models can approximate some of Mythos’s capabilities. The underlying techniques, once demonstrated, tend to diffuse.

This means Mythos may not be an anomaly. It may be an early signal of where the field is heading — toward AI systems that interact directly with digital infrastructure, mapping its weaknesses and testing its limits, rather than merely assisting humans who do so.

That shift creates a new kind of concentration: a small number of actors — corporations, governments, or well-resourced bad actors — gaining the ability to systematically map vulnerabilities across widely used systems. Even if the original model stays locked down, the ideas behind it do not.

Regulation Is Lagging. That’s a Policy Failure.

Governments have begun to respond. In the United States, Anthropic’s relationship with federal regulators has visibly shifted. In India, the government is convening meetings with banks and financial institutions specifically focused on risks posed by models like Mythos. The implicit recognition: if an AI system can meaningfully affect cybersecurity across entire sectors, it is no longer just a product. It is infrastructure.

That framing demands answers to questions that currently have none. If private companies build systems capable of identifying and exploiting critical vulnerabilities, what are their legal obligations? Who governs vulnerability disclosure timelines? Who decides which organizations get access? What accountability exists when containment fails — as it already briefly has with Mythos?

There is also a geopolitical dimension that cannot be ignored. If multiple countries develop parallel versions of these systems, the same digital infrastructure could be simultaneously probed and defended by competing state-aligned actors — a destabilizing dynamic with no clear governance framework.

Anthropic’s restraint in not releasing Mythos is genuine and worth acknowledging. But restraint by a single company is not a regulatory regime. The brief appearance of Mythos outside controlled settings — however limited — is a preview of what happens when capability outpaces governance. The ground is shifting. The institutions meant to manage that shift are not keeping pace.