Anthropic initially made users of the Claude Mythos Preview model sign confidentiality agreements to prevent findings from being shared, according to the Wall Street Journal, but the Journal says that all changed last week.
Up until now, the most important thing to keep in mind about Claude Mythos Preview, purportedly the scariest AI model in the world, has been its secrecy. To use it, you would have to be one of the VIPs allowed to participate in Project Glasswing—reportedly a very select group of about 50 companies and organizations.
If you are one of the Claude Mythos Preview testers participating in Project Glasswing, you’re meant to use the model to find security vulnerabilities, and the sense early on was that participants had a crushing responsibility on their hands to keep everything a secret—as if the fate of the world depended on secrecy.
But according to the Journal, Democratic Representative Josh Gottheimer wrote a letter to Anthropic complaining about this. “No entity should be contractually restricted from warning others, coordinating mitigations, or informing relevant and trusted stakeholders about urgent cyber risks,” Gottheimer wrote.
The Journal’s report, published Monday, makes it sound as if Anthropic has been struggling to find its footing on the question of what can be done with outputs from Mythos Preview. An anonymous Anthropic spokesperson told the Journal, “Confidentiality protections were something partners asked for at the outset and were built into agreements partners signed,” but added that Glasswing has “matured,” and the user agreements have evolved “to ensure key information can be shared broadly,” including beyond the bounds of Project Glasswing.
Another event that occurred a week ago was the announcement of a similar program, called Daybreak, from Anthropic’s chief competitor, OpenAI. Daybreak was much less secretive than Project Glasswing from the jump, allowing anyone to fill out a brief form and request to have their codebase scanned by OpenAI’s latest cybersecurity model. CEO Sam Altman posted on X that he’d like to work with “as many companies as possible now.”
It looks like companies have already started to speak publicly about what Mythos Preview has shown them. For instance, I couldn’t help but notice Grant Bourzikas, chief security officer at Cloudflare published a blog post Monday about what he and his company found while tinkering with Mythos Preview. It’s an informative post, describing Mythos Preview as similar to other bug-finding LLMs, but adding, “What changed with Mythos Preview is that a model can now take those low-severity bugs (which would traditionally sit invisible in a backlog) and chain them into a single, more severe exploit.“
But there’s an intriguing coda at the end of the post. Bourzikas promises to share additional findings with customers soon, and says, “If your team is doing similar work and would like to compare notes, reach out to us,” and then he provides an email address.
So the shroud of secrecy around Claude Mythos Preview sounds like it’s being lifted ever so slightly. The folks at Anthropic are sure to feel like their model is losing some of its mystique, but an air of mystery around an LLM is not something that can last forever.
Read the full article here
