What cannot be verified · Articles

I recently watched an interview with Andrej Karpathy and an idea stuck with me:

"Traditional computers automate what you can specify in code. LLMs can automate what you can verify."

Karpathy explains that current models are trained with reinforcement learning in verifiable environments: the model tries an answer, something evaluates whether it's correct, the model learns. That's why they're so good at math, code, chess. And why they're "jagged" everywhere else. When the verifier is ambiguous or subjective, the model falls behind. He takes it a step further with something worth pausing on: that almost everything is eventually verifiable, that verifiability is a spectrum rather than a binary, and that even writing can be evaluated by a council of model judges.

The question that hits me is direct: is a design verifiable?

The layers that are

The answer is neither yes nor no. There are layers, and each one gets verified differently.

The most basic is technical heuristics: color contrast, touch target size, visual hierarchy, WCAG compliance… all of this can be verified with automated tooling. A layer above are usage metrics: conversion, time on task, drop-off rate, A/B tests with real users… here we've been delegating a good part of our judgment to the system for a while now.

And further up are the principles laid down by Dieter Rams, Jakob Nielsen, John Maeda… along with consistency with the brand, with the design system… This enters softer terrain, because two designers can interpret it differently, but it's still reasonably verifiable. A trained model can review whether a design follows a set of precepts and argue the case.

Much of design work can be automated because it can be verified. And it will probably be automated better than the way we do it ourselves.

Where a designer learns

If you look closely at these layers, something jumps out. They are exactly the layers a junior designer wrestles with. The grid, the plugin of the moment, the well-built auto-layout, the component that respects the tokens, the heuristics learned three weeks ago, the metrics that prove the button works better in blue. It's the layer where you get measured, where you learn, where you show you can do your job.

And it's exactly the layer being automated first because it's the easiest to verify.

What sets a senior apart is not that they do these things better. It's that they care about other things. Whether the design says something. Whether it fits the history of the brand. Whether it will be read as intended or in an uncomfortable way. A background intuition, hard to articulate, that tells them whether something is right or not, before they can explain why.

It's what Michael Polanyi called tacit knowledge: we know more than we can properly express.

This isn't taught in a bootcamp. It isn't really taught anywhere. It accumulates. It gets metabolized. It's what I've called poso in other posts: what enters the body of someone who designs through the years, through the reading, through the failed projects, through having seen how decisions age. Karpathy's frontier, the one that separates the verifiable from the rest, has been there inside the craft for decades. AI didn't invent it. It's making it more visible.

Here an uncomfortable twist appears. If the verifiable layer gets automated, where will seniors come from? The craft's career path was traditionally built by climbing from the verifiable up to what isn't. You spent years applying heuristics and operating inside systems, and along that path your judgment was formed. If the first half of the journey is done by a machine, what happens to the formation of judgment? Maybe it won't get tempered, because that practice was where the eye trained itself. Maybe, on the contrary, juniors freed from applying heuristics by hand can spend that time reading, looking at architecture, traveling, maturing the craft and forging their judgment.

Where the verifier breaks

There's another layer that doesn't fit this frame.

How do you verify that a design has layers of meaning? That it converses with the history of the craft without imitating it? That a particular use of negative space is a nod to Swiss design of the sixties? That a typographic choice honors Vignelli without becoming pastiche? How do you verify that a design moves you, that it sparks curiosity on the first glance and revelation on the tenth? How do you verify that a user, without knowing why, will feel that this is well done?

And harder still: how do you verify how a specific stakeholder will read it? A CEO who comes from finance and instinctively distrusts what's clean because it looks empty to them. An investor who has seen too many products look alike. A founding partner whose taste was shaped through a thousand small decisions and who reacts viscerally before being able to explain why.

There is no verifier here. Or there are as many as there are interlocutors. Each one brings a story, a sensibility. Design gets verified in the eye of whoever is looking.

The council of judges

Karpathy would say this is also verifiable, just more expensive. That you could set up a council of model judges. One trained on Rams, another on Norman, another on Maeda. One simulating a finance-minded CEO, another an investor, another a user in their fifties. In a way it's what we already do when we show a design to several people and process their reactions. What AI would add is scale.

And yet, a doubt remains. A model trained on Dieter Rams' work is not Dieter Rams. It's a statistical projection of what he left written and designed. Can that projection think what Rams would think?

The question that remains

If Karpathy is right, the verifiable part of design will get automated. Heuristics, metrics, consistency with systems, application of principles. What stays in the designer's hands are the layers of meaning, the cultural choices, the intuition about what will be felt before anyone can explain why. What I called poso in Gesamtkunstwerk.

Is the unverifiable part of design the core of the craft, what makes us irreplaceable? Or is it just the rest that hasn't been verified yet, until a council of virtual judges does that well enough too?

My intuition says it's the first. And perhaps that intuition, precisely, is one of those things that cannot be verified.

–

Follow me on LinkedIn to stay up to date with new posts.

–

References:

Andrej Karpathy: From Vibe Coding to Agentic Engineering (YouTube)