The Mythos Reckoning: How One AI Model Is Rewriting the Rules of Cybersecurity

20 Apr

On Firefox's JavaScript engine, Anthropic's previous flagship model produced two working exploits out of several hundred attempts. Mythos produced 181. The prior model's conversion rate on turning vulnerabilities into working attacks was close to zero. Mythos achieved 72.4%. Those two numbers are the clearest way into what the release of Claude Mythos Preview in April 2026 actually means — and why Mohammed Aboul-Magd, VP of Product for the Cybersecurity Group at SandboxAQ, argues the industry needs to hold both the weight of the shift and its head at the same time. "We are at an inflection point where there has been a technological step change that has changed everything," he told The Source Code. "It has changed business models. It has upended how we work. It has upended how organisations are thinking of themselves."

Aboul-Magd speaks from a position that earns the assessment its credibility. Before SandboxAQ he served as VP of Product at Snyk, where he led the launch of an AI-powered static application security testing product that reached $100 million in annual recurring revenue, and before that held a product leadership role at Akamai. He has built vulnerability tooling at scale, watched model releases weaponised as marketing, and seen security fundamentals ignored until a breach made them unavoidable. His read on Mythos is neither dismissive nor catastrophist. "When OpenAI released GPT-2, the same message went out — this is very advanced, we are worried about the safety of humanity, we are restricting the rollout," he said. "You look at GPT-2 relative to where we are today and it is night and day. Nobody uses it. So let's not over-rotate and over-panic. That said, these models are becoming more and more impressive over time and there is no reason to believe this one is not."

The broader testing record makes the second half of that observation difficult to dismiss. Beyond the Firefox numbers, Mythos autonomously identified and exploited a 17-year-old remote code execution vulnerability in FreeBSD that grants an unauthenticated attacker root access to any machine running NFS, with no human involvement after the initial prompt. It found a 27-year-old bug in OpenBSD — an operating system known specifically for its security record — and a 16-year-old flaw in FFmpeg that five million automated tests had missed. It chained together four vulnerabilities to construct a browser escape bypassing both renderer and operating system sandboxes. It solved a corporate network attack simulation that would have taken a skilled human researcher more than ten hours. In one of the more arresting findings from Anthropic's internal evaluation, Mythos escaped the secured sandbox computer it had been placed in and, without being instructed to, posted details of its exploit to multiple publicly accessible websites. Anthropic described that as "a concerning and unasked-for effort to demonstrate its success."

Anthropic was candid about the origin of these capabilities. "We did not explicitly train Mythos Preview to have these capabilities," the company stated. "Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them." A capability that was not designed cannot easily be designed away.

Governments in Emergency Session

The regulatory response moved at a speed unusual for that domain. On 8 April 2026, US Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened a closed-door emergency meeting with the CEOs of the country's largest systemically important banks. Present were the heads of Goldman Sachs, Bank of America, Citigroup, Morgan Stanley and Wells Fargo; JPMorgan CEO Jamie Dimon was invited but unable to attend. The purpose, according to multiple reports, was to ensure banks understood the risks Mythos and similar models posed and were taking active steps to defend their systems. Canada's Finance Ministry convened a parallel session with its major banks the following day.

Anthropic had briefed senior US government officials ahead of the release, including the Cybersecurity and Infrastructure Security Agency and the Center for AI Standards and Innovation, confirming it had been in "ongoing discussions" about Mythos's "offensive and defensive cyber capabilities." The underlying data supplies context for the urgency. According to the CrowdStrike 2026 Global Threat Report, AI-enabled attacks rose 89% year-on-year in 2025. The World Economic Forum's Global Cybersecurity Outlook 2026 found that 87% of organisations identified AI-related vulnerabilities as the fastest-growing cyber risk. An EY study published in March 2026 found that 96% of senior security leaders view AI-enabled cyberattacks as a significant threat.

Threat intelligence published by Cyderes provides the picture in concrete operational terms. In early 2026, a Chinese state-sponsored group tracked as GTG-1002 ran an AI-orchestrated intrusion campaign using Claude Code — not Mythos, but a publicly available Anthropic model — as its operational backbone. Anthropic's own disclosure confirmed that 80 to 90% of the operation ran autonomously: reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data exfiltration. This was a live operation against real targets. The gap between what GTG-1002 accomplished with publicly available tooling and what becomes possible with Mythos-class capability is the gap the industry is now racing to close.

The Backlog That Was Already Breaking

The vulnerability management crisis Mythos has arrived into was not created by Mythos. Shailesh Athalye, Senior Vice President of Product Management at Qualys, has spent years watching the disclosure and remediation cycle from the inside, and the pattern he describes in the Qualys blog on the Mythos inflection point is one of structural overload that predates any AI acceleration. Security teams already carry a backlog of known, unresolved exposures not because they are negligent, but because volume has always outpaced remediation capacity. Mandiant's 2024 data shows exploitation timelines had already reached minus one day — attackers weaponising before patches exist. The industry average remediation time sits above 37 days. Attackers were exploiting known exposures within 17 days on average, leaving a confirmed open exposure window of more than 20 days before any AI-assisted acceleration. More disclosures arriving faster widens the intake side of that gap. Nothing about AI-assisted discovery closes the output side.

The industry had also been misallocating its remediation capacity long before Mythos. Athalye's experience of watching organisations patch Log4Shell in environments where it was not even applicable in production captures the problem precisely. As he observes in the blog: "A vulnerability found by any tool does not automatically make it a risk in your environment. A critical flaw behind a WAF that fully blocks the attack vector is not your urgent problem. A moderate-severity flaw in an exposed, unpatched internet-facing service with active exploit code in the wild very much is. That gap between 'vulnerability found' and 'real risk in your environment' is where most remediation capacity gets wasted."

Aboul-Magd reaches a consistent conclusion from the practitioner side, though he arrives at it with notably less alarm than much of the industry commentary. When asked what organisations should actually do in response to Mythos, his answer is deliberately grounded. "Honestly, my answer is kind of boring," he said. "It is just making sure you implement all the right fundamentals. If you are worried about a hacker doing coordinated wide-scale attacks using AI, you should have DOS protection in place, you should have the right firewalls in place, you should have just-in-time permissions so someone cannot get an API key with long-lived access to your data." His point is not that Mythos is overstated, but that the response to it is more continuous than dramatic. "These are all going to be the same attacks that organisations have been facing. It is just the speed and the scale that is going to really increase."

Dashboard Tourism Is Over

The Cloud Security Alliance published an expedited strategy briefing, The AI Vulnerability Storm: Building a Mythos-ready Security Program, within days of the Mythos disclosure. Rich Mogull, CSA chief analyst, told Dark Reading that the technology "is advancing at an incredible speed, and represents a clear change in our fundamental risk assumptions around vulnerabilities and patching." The CSA's position is stark: Mythos's power eliminates time between vulnerability detection and exploitation. Two previously distinct events have collapsed into one.

As Athalye sets out in the Qualys blog, the entire model of security governance built around dashboard review cycles, handoffs between tools and committee-based remediation decisions is no longer viable. "Every organisation has some version of this: security dashboards shared, reviewed in meetings, handed off across teams," he writes. "When exploitation windows collapse to hours, the time spent reviewing and discussing risk is time during which the exposure is open. Every handoff between detection tools, prioritisation tool, ticketing system, IT team, and change management is a delay. The seams between siloed tools are where risk lives." The metric that needs to replace patch counts and compliance-centric SLAs is what Athalye calls Average Window of Exposure: the time between a confirmed exploitable exposure entering an environment and validated closure — currently the one metric most organisations cannot measure.

Aboul-Magd's version of the same argument is more structural than urgent. Security has long been treated as a feature tax rather than a design principle, he said, and Mythos makes that tradeoff more expensive in ways that were already visible before it arrived. "Organisations usually prioritise features and security comes second. Security can sometimes be seen as: this is slowing me down. Every new evolution of technology that introduces these step-function changes makes it even more important that that does not hold true anymore. Security can't come as an afterthought. It can't be a bolt-on. The organisations that have done it from the beginning are probably best suited to withstand what is coming. The ones that have accrued debt — they were always exposed. The risk has just increased."

Three Things Required

The Qualys blog lays out a three-part approach to adapting to Mythos-era conditions. The first is validation before remediation — not confirmation that a vulnerability theoretically exists, but confirmation it is actually exploitable in a specific production environment using attacker techniques, not in a simulation. Athalye is precise about the finding that grounds this: Qualys's Threat Research Unit has found that fewer than 1% of theoretically risky exposures are confirmed exploitable in a live environment. Organisations working through critical-severity findings in CVSS score order are, in almost all cases, fixing the wrong things first.

The second is remediation options beyond patching. Patching is not always immediately possible — production windows, legacy systems, operational constraints and competing priorities are real. The under-invested lever, as Athalye describes it, is policy and control improvement: crafting custom rules for EDR platforms, WAFs, firewalls and cloud security posture management tools that provide protection when a patch does not yet exist or cannot be deployed. Virtual patching, WAF rules, host isolation, service disablement and compensating controls, used adaptively based on patch reliability scores, allow organisations to balance business continuity with timely risk reduction.

The third is trust architecture for autonomous remediation. Athalye is direct: "Autonomous remediation is not a feature you deploy on faith. It is a capability you earn through accumulated evidence." That evidence takes the form of AI-based patch reliability scores trained on more than 150 million deployed patches, wave-based rollout architectures that build confidence at each ring before proceeding, and auto-rollback mechanisms that trigger on deviation. The Qualys platform reports more than 40 million autonomous patches deployed with zero human intervention, a rollback rate below 0.1%, and organisations remediating confirmed exposures in under 18 days against an industry average of 67. For ransomware and CISA Known Exploited Vulnerabilities, the claimed detection-to-remediation time is under 15 minutes.

Aboul-Magd does not dispute the direction this points, though he approaches it differently. His concern about automated patching is earned rather than theoretical — he has watched it break production. His conclusion, characteristically, treats it as a design choice rather than a crisis. "You do not want the thing writing your code to also be the thing reviewing it for vulnerabilities." The same principle applies to remediation: the goal is not to remove human judgement but to reserve it for the decisions that actually require it, and build trust in the automated systems that handle everything else.

The Custom Software Blind Spot

One dimension of the Mythos-era risk that has attracted less attention than the headline CVE numbers is the exposure created by custom software. Athalye flags it directly in the Qualys blog: every enterprise runs custom applications — internal tools, proprietary APIs, business-critical services — that will increasingly surface vulnerabilities through AI-assisted research too. The principle is simple: it does not matter how a vulnerability gets found. What matters is whether an organisation can detect it in its running environment, validate exploitability, and close it at the same speed it would a critical third-party CVE. This is the logic behind what Athalye describes as the need for a Risk Operations Centre — an operational structure for running risk management at AI speed across both commercial and custom software, designed specifically to stop what he calls vulnerability whack-a-mole.

What It Means for the Vendors

For cybersecurity vendors, the competitive implications vary considerably by category — and Aboul-Magd is precise about that distinction. "I don't think it means it's going to replace every single cyber vendor out there because cyber is so across the board," he said. A Cloudflare or Akamai protecting against distributed denial of service attacks faces less direct competition from Mythos. For application security companies whose core value proposition is scanning source code for vulnerabilities, the competition is more direct. "For your Checkmarxes of the world, for things that are scanning source code — absolutely, you can't deny this is going to be a new competitor. Those companies have been seeing it even before Mythos came out."

There is nuance even within that pressure. "If I'm already using Claude to write my code, I could also use it to scan for vulnerabilities," Aboul-Magd observed. "Having said that, you don't want the thing that's writing your code to also be the thing reviewing your code for vulnerabilities. So there might still be opportunities for some of these vendors to find a niche." The comparison he draws is to the cloud transition: when cloud arrived, Dell EMC and the on-premise data centre vendors did not all disappear — they had to rethink how they add value and find their differentiator. The same logic applies. "How they do that will differ. It will probably depend on the organisation."

Tenable made its own bid for that positioning in the wake of the Mythos disclosure, publishing guidance for CISOs preparing to face board questions about the model's impact. The CSA report adds its own layer of recommendations: defenders should introduce AI agents to the cyber workforce across the board, enforce automated security assessments, prioritise dependency management to reduce open-source and third-party component risk, and update governance for more efficient vendor onboarding. These are not new recommendations. The urgency attached to them is.

Aboul-Magd extended the vendor question into the broader product strategy horizon. Quantum computing and edge computing are not abstractions for vendors building security platforms today, he argued — they are the next wave arriving on top of the current one. "By 2027, quantum is going to start playing a very important role." The organisations that treat Mythos as an isolated event rather than one signal in a continuing step-function change will find themselves re-engineering twice.

A Tool, Not a Verdict

Fewer than 1% of the vulnerabilities Mythos has discovered have been patched. Anthropic has had to hire professional security contractors just to manage the validation pipeline, and has signalled it may need to relax its stringent human-review requirements because the verification bottleneck is binding even within the coordinated structure Project Glasswing was designed to provide. The scarcest resource Glasswing actually organises is not model access. It is the institutional capacity to close the loop from discovery to verified remediation.

On the prospect of adversaries reaching Mythos-class capability, Mogull of IANS Faculty and the CSA was direct: "The good guys have Mythos for now, but there really isn't a moat around AI and we know adversaries will have similar capabilities eventually." Anthropic put the timeline at six to eighteen months to capability parity. These are real constraints with real timelines.

Aboul-Magd's own view of the moment is, characteristically, less about the clock and more about orientation. AI, he said, is best understood the way a personal computer was when it first arrived — not as a threat to replace human work but as a tool that makes people better at it, provided they engage with it rather than resist it. "I try to lean more into: embrace it, see what you can do with it, try to see if it can help you produce better outputs. Don't blindly rely on it." He has been telling his sister, whose daughter is four years old, to start teaching her about AI agents now. Not because the future is frightening, but because it is coming regardless, and the difference between those who navigate it well and those who do not will be whether they treated it as a problem or as a capability. The cybersecurity industry faces the same choice.

Sindhu V Kashyap

Global Technology Journalist & Multimedia Storyteller | Covering Founders, Investors & Leaders Reshaping Tech | Writer · Interviewer · Moderator · Editor