What Did Spotify Actually Do?
In an interview posted on Anthropic's YouTube channel, Spotify engineering leader Niklas Gustavsson shares a few numbers. AI now writes 73% of pull requests (PRs, code change requests) directly, PR frequency is up more than 75%, and Spotify ships roughly 4,500 production deployments a day while quality metrics stay flat. The internal tool behind all of this is called Honk.
Honk didn't start out this way. About five or six years ago, Spotify noticed its codebase was growing seven times faster than its engineering headcount, so they built a tool to automate repetitive maintenance like version upgrades and API migrations. Early on it relied on deterministic scripts, which quickly hit a ceiling because code has an enormous API surface. After many rounds of trial and error bolting on LLMs, it became what it is today. In its early iterations, a review, or "judge," step lifted PR success rates from roughly 20-30% to 80%, but once the models and the agent itself got good enough, that judge step was removed entirely.
Why Must Verification Come Before You Hand Work to AI?
When Spotify decided to auto-merge PRs without a human reviewer, the first investment it made was in test automation. Previously, the team that owned a piece of code reviewed every PR by hand, so tests could afford to be somewhat loose. Removing that human review meant tests had to become solid enough to do that job instead. Gustavsson describes verification as the single most important factor in any closed loop where an agent works without a human in it.
The same principle applies to office work. If you want to hand off a draft report, a customer email, or a data summary to AI, you first need a standard -- automated or manual -- for catching whether the output is right. If output speeds up without a verification standard in place, errors spread just as fast.
Why Does Standardization Determine How Well AI Works for You?
Gustavsson says that when the same feature is implemented ten different ways across the codebase, even AI gets confused. Conversely, the more consistent the code, tools, and frameworks are, the clearer the patterns AI has to draw on, and the better its output gets. This kind of standardization was originally an investment made to make things easier for people; now, he notes, it has become a condition for how well AI performs.
The same is true in office organizations. When report formats, folder structures, and email tone differ from team to team, handing work to AI produces a different result every time. Teams whose formats and processes are already standardized see AI pay off almost immediately.
Where Does the Time AI Frees Up Actually Go?
Gustavsson's own shift is telling. He used to have AI write 70-80% of his code and then finish the rest by hand in an IDE; now that finishing step has disappeared entirely. He says the time that freed up naturally flowed into prototyping, talking to customers, and thinking about what to do next.
Spotify turned this shift into company-wide infrastructure. Even non-engineers can describe an idea in natural language and get a working prototype out of a system Spotify built for exactly that, and there's an internal "app store" where people share and try each other's prototypes. Validating an idea used to mean convincing an engineering team and waiting weeks; now it can be checked against real data within a day. Even one of Spotify's co-CEOs has a prototype of their own posted in that internal app store.
What Should Office Organizations Prepare for, From an AX Consulting View?
In AX consulting work, there's a question I always ask the organizations I meet: is your team's know-how written down somewhere, or does it live only in a few people's heads? As Spotify's case shows, how fast and how safely you can use AI ultimately comes down to how much verification standards and standardization you already have in place.
One more thing Gustavsson said near the end is worth sitting with. He used to genuinely enjoy the act of writing code by hand, and he worried AI would take that enjoyment away. What he realized instead was that what he actually loved was solving problems, not the specific act of typing code. That distinction holds for office work too. If you can tell which part of your job you actually value -- the tool you use, or the problem-solving and judgment you get out of it -- it becomes much clearer what to hold onto and what to hand over in the age of AI.