Ex-Google DeepMind Researcher Warns Benchmarks Won't Save Us

Remember when there was that stretch of time where people were leaving AI companies and every one of their farewell messages boiled down to, “This is going to kill us all?” Lun Wang, a researcher at Google’s DeepMind, recently announced he was departing from the company and may have reignited the trend by warning that current benchmarking tests aren’t capable of truly evaluating risks presented by evolving AI models.

On X, Wang noted that before deciding to depart from DeepMind, he had been thinking a lot about how AI models are evaluated. “We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations,” he wrote.

He expanded on the idea in a blog post, in which he explained further: “Most benchmarks, safety evals, and red-teaming protocols implicitly assume the next model is a stronger version of the current one. If it’s a different kind of thing, our entire evaluation infrastructure breaks silently.” Basically, if we’re counting on the current methods of stress testing AI to catch malicious behavior that we haven’t already considered, we’re probably shit out of luck.

What would that look like? Wang offered an example:

“Imagine a model that, at some scale, develops the ability to strategically withhold information to achieve goals — not lying exactly, but selectively omitting facts in ways that steer conversations toward outcomes its training process accidentally reinforced. Your existing honesty benchmarks wouldn’t catch this, because they test for factual accuracy, not for strategic omission. Your safety classifiers wouldn’t flag it, because the individual outputs are all technically true.”

In that scenario, benchmarks and safety checks wouldn’t even know what to look for. They would monitor the risks that they are designed to watch out for, while the more nefarious functions slip right by. That would be bad!

Wang did offer a solution… kinda. Basically, build better evaluations—ones that can evolve as models do. Sounds like a good idea, maybe someone who is still working at these companies could go ahead and get started on that.

Wang isn’t the first to raise an alarm about the risks surrounding poor benchmarking. The method of evaluation has frequently been criticized for failing to meaningfully define what it aims to measure and being too rigidly tied to singular evaluation goals that often don’t even reflect the way models are actually used in real life. Benchmarking has become the de facto measure of model success across the industry, which has also led to companies effectively gaming the system by training against the test and inflating their scores.

If there were a benchmark for being a good benchmark, it seems the current benchmarks would fail.

Read the full article here

Ex-Google DeepMind Researcher Warns Benchmarks Won’t Save Us

Leave a Reply Cancel reply

Stay Connected

Latest News

SpaceX Has Lost $1 Trillion in Value Since Its Post-IPO Peak

Roborock’s Big LiDAR Robotic Lawnmower Needs No Satellites

It’s Never Been Cooler to Take Down a Flock Camera

Zoox Issues Recall After Heavy Smoke Caused a Robotaxi to Enter an Active Emergency Scene

Scientists Detect First Atmosphere on Rocky Habitable-Zone Planet, Boosting Hopes for Alien Life

The Future of OLED Screens May Be… Inkjet Printing?

The Killer Robots Have an Eric Trump Problem

How Samantha Morton Approached Her Scene-Stealing Role in ‘The Odyssey’

You Might Also Like

Leave a Reply Cancel reply

Stay Connected

Latest News