A model Anthropic does not want to sell
SWE-bench Verified is the test that measures a model's ability to resolve real bugs on public GitHub repositories. Claude Opus 4.6 — the best model available today — scores 80.8%. Mythos Preview scores 93.9%.
This is not a marginal improvement. It is the difference between a senior engineer and an entire team.
Anthropic developed Mythos Preview but deliberately chose not to make it publicly available. Not due to technical limitations, but for security reasons: the model's capabilities in critical areas such as cybersecurity and software exploitation are so high that they require access controls far more rigorous than a public API can provide.
What Mythos Preview can do that Opus 4.6 cannot
The benchmark numbers are already impressive. But the most interesting part concerns capabilities in information security.
On Firefox 147 Exploitation — a test measuring the ability to exploit real vulnerabilities in a modern browser — Opus 4.6 scores 15.2%. Mythos Preview scores 84%. A gap not measured in percentage points but in orders of magnitude.
Mythos Preview autonomously found a bug in OpenBSD that had been hidden for 27 years, a vulnerability in FFmpeg that had escaped five million automated tests without detection, and vulnerabilities in the Linux kernel. These are not purpose-built benchmarks: they are real systems, in production, used by billions of people every day.
The leap in coding and reasoning capabilities
On SWE-bench Pro — a harder variant with real software engineering tasks — Opus 4.6 stops at 53.4%. Mythos Preview reaches 77.8%.
In practical terms: Mythos Preview can take a complex codebase, understand the architecture, identify the problem, and propose a working solution with a success rate that surpasses many human development teams on medium-difficulty tasks.
Even on CyberGym Vulnerability Reproduction — reproducing known vulnerabilities in controlled environments — the gap is clear: 83.1% versus 66.6% for Opus 4.6. For those building security tools or working in defensive security, this means access to analysis and detection capabilities that simply do not exist anywhere else today.
Want to get the most out of Claude in your company?
30 minutes to discuss your specific case.
What this means for organizations adopting Claude today
The first reaction to news like this is often: then I'll wait. Does it make sense to wait for Mythos?
The answer is no, and it's worth understanding why.
Mythos Preview is not an evolution of Opus 4.6 that will be available soon. It is a research model with capabilities requiring specific access controls. Its public release, if it happens at all, will be conditional on securing the very capabilities that make it powerful.
Meanwhile, every week that passes without implementing Claude is a week of competitive advantage handed to competitors who are already moving. The Claude ecosystem — from models available today to development tools, from MCP to agents — is already extraordinarily capable.
The model you have access to today is already extraordinary
Claude Opus 4.6 resolves 80.8% of real bugs on SWE-bench Verified. A few months ago that number seemed like science fiction.
Claude Sonnet — the most widely used model for enterprise implementations — handles 200,000-token context windows, reasons over complex documents, produces production-grade code, and supports end-to-end business workflows. All with data governance appropriate for European enterprise contexts.
Mythos tells us where we are headed. But what exists today is already more than sufficient to transform real processes, reduce real costs, and free up real time for people. You do not need to wait for the next leap to start seeing results.
How to get the most out of Claude in your company
The advantage of starting today is not having access to Mythos Preview. It is having six months, a year of hands-on experience with Claude when Mythos — or any successor — becomes accessible.
Understanding how to structure prompts, how to design agentic workflows, how to integrate Claude into existing systems, how to train teams for daily use: these skills are built over time and through practice. They cannot be improvised when the next model arrives.
Maverick AI works with companies that want to build these capabilities in a structured way. From identifying high-impact use cases to production deployment, from team training to measuring ROI. If you want to understand where to start, let's talk.