Pair Programming in the Age of Agents
No tokens were spilled in the writing of this post, other than researching some of the references. This is entirely hand-crafted, artisanal writing.
I’ve been pair programming for over 20 years at this point. For about the last six months I’ve been working seriously with agentic coding tools like Claude Code, roughly since Superpowers first appeared — blimey, that feels like a long time ago!
I’ve noticed I’m soloing a lot more these days. It’s somewhat puzzling, somewhat troubling and somewhat exhilarating. Software development is definitely changing, and I’m not sure what things are going to look like just yet when it all shakes out.
This post explores that.
One of the things I like best about creating software is co-creating it with other people. In my experience, pairing (or it’s wild cousin software teaming or ensembling) has never been all that popular outside of a fairly niche extreme programming community. But for those of us in the know, it’s a practice we hold very dear.
Lately though, I hear a lot of people talking about “pairing with Claude”. Can you really pair with an LLM? How does it compare to pairing with a human?
Let’s talk about some of the benefits of pair programming, and compare the experience of doing it with humans and robots.
Someone to take turns writing code
This is probably the most obvious activity of pair programming. Writing code, normally using test-driven development, swapping back and forth regularly.
As of Claude Opus 4.5, I find that the robot writes mostly decent code, and decent tests, provided I’ve given it the right context and there are good existing patterns to follow. I find I’m still mostly in charge of steering any signifiant refactoring or architectural moves, but I’ve created entire projects where I’ve barely even read the code.
These things are also capable of switching from bash, to Go, to Rust, to Elixir in the blink of an eye. I haven’t known many human programmers that could do that.
Certainly, I’ve had recent experiences where LLMs made large changes that looked superficially useful but had subtle, serious errors in them. Spotting these by inspecting the code yourself is hard, and can end up feeling like more work than it would have taken to write it yourself in the first place. This is exhausting and demoralizing and reminds me of that Deming quote:
our system of make-and-inspect, which if applied to making toast would be expressed: “You burn, I’ll scrape.”
I believe most of these weaknesses can be mitigated by adding gates (what Geoffrey Huntley calls “backpressure” to your development flow alongside your automated tests — linters, mutation testing, etc. — to really flush out the mistakes a robot might make and provide it with fast feedback. These guide the robot towards the kind of code you want it to produce.
I’ve also noticed that robots don’t really practice TDD. They tend to write tests and code in one-shot, and need deliberate prompting to refactor. Jesse Vincent’s excellent superpowers plugin includes a test-driven-development skill which I’ve had some more success with. I’m also curious about how an orchestration system like nWave or metaswarm, or even Claude’s own agent teams feature might help with simulating more of the real back-and-forth that happens in a real human pair, but I’ve yet to really try that out in earnest.
In my experience, you don’t really take turns anymore when you “pair” with an LLM. In the driver-navigator dynamic, you’re spending a lot more time reading the map. In this respect it’s much more like delegation than collaboration.
Someone to help me think out loud
There’s evidence that a social learning environment activates more of the “circuits” of our brains so in the context of programming, bringing more of our brain to bear on the problem. Most of us have experienced the phenomenon of rubber duck debugging, where the act of simply explaining the problem you’re stuck on helps you to unlock some insight that was hiding in your brain all along.
I think LLMs do a really decent job of this. The back-and forth dialogue, the questions it poses, can really help you to examine blind-spots and think things through.
Someone to bounce my ideas off
When it works well, pairing is a lot like improv: My idea builds on your idea, and your idea builds on my idea. Together we create something much better than either of us could have created by working alone.
I don’t think LLMs are great at this, at least not yet. You can manufacture push-back with the right prompt, you can have it stimulate creative ideas in you, but it still feels sort of hollow to me. The rapid feedback loop of conceive it/build it is exhilarating and can inspire a lot of creativity, but that creative energy is all coming from me.
Someone to review my code
A pair can also critique your solutions in real time. This is what extreme programming calls “continuous code review” and it creates a feeling of real confidence in the solutions you create together.
With the robot generally writing the code, does it now fall to the human to review it?
I’m seeing useful results from using one agent to review another agent’s work, but agents need the right context to be able to give meaningful code or architectural reviews. Making your conventions and context explicit through ADRs and markdown documents really help the robots, and can help the humans too.
Someone who knows things I don’t know
Your human pair might surprise you with a novel perspective or piece of context, stretching or shrinking the space of possibilities.
Agents can certainly research and tell you about things you’d never have known about or thought of. They’re also amazing at reading through reference material, so if you’re taking the time to distill reference documents, you can really leverage them as an oracle who knows more about your project than you do.
Someone who knows where the skeletons are hidden
In my experience, there’s an illegible knowledge of the technical debt in any given codebase that lives in the heads and conversations of the team who works on it.
You know where you’ve made compromises, where there’s friction. You manage it, and when you get the right opportunity you take the time and pay it down.
Right now robots can only have this kind of knowledge if we write it down for them. I am a big proponent of ADRs and I am finding them increasingly useful these days, both because our codebases are changing so fast, and because we can share the trade-offs we’re making as humans with the robots.
So making your technical debt more legible might help, but for now I think we humans will need to keep track of this.
Someone to suggest it’s time to take a break
Coding with LLMs can be pretty seductive, even addictive. I notice that my agent tends to end almost every response with a question, pulling me back in.
One of the nice things a human pair will do is notice the vibe, and encourage you to take a break at the right moment. Robots are, let’s be honest, fucking terrible at this right now.
Someone who cares how my weekend was
My friend Zach calls it Friendship-Driven Development for a reason.
Just like playing sports with other people develops friendships, the shared experience of wrestling with intellectual problems together really helps you get to know someone. I care about my colleagues on a much deeper level than I would if we just soloed on tickets off our backlog all day, because pairing requires us to be more vulnerable and intimate with each other: sharing our opinions, our puzzles, our victories, our good and bad moods.
This just makes work a lot more enjoyable – the feeling that I care about and am cared about by my colleagues. I am not even going to entertain the idea that a robot could substitute for that.
So what is pairing anymore?
Ultimately, software is a model of the world. Models are always wrong, but they are definitely more wrong when they’re built by people who only communicate through Jira tickets (or markdown documents). We need to share the model for it to come alive.
The performance of a knowledge work team is a function of how quickly an idea spreads between its people.
When people on a team make decisions about the software together in real-time as a pair or ensemble, not only do those decisions happen quickly, they naturally become a shared artefact, something the whole team trusts and believes in.
I’m still trying to work out how we keep that same magic in the age of agentic coding and dark factories, or whether we still need to at all.
I believe that pairing is a skill. It requires the humility and courage to be able to look vulnerable in front of your pair, to admit when you don’t know or don’t understand. It requires caring about your pair and treating them and their ideas with respect.
Pairing is sharing our discomforts, frustrations and successes with each other as we try to solve problems. High fives. Sharing big decisions and half-formed ideas. Coming up with silly names for things, apt names for things, downright perfect names for things. Learning together. Going through something together.
I’m glad people are getting some of the benefits of real human-human pairing by working with LLMs, and I’m excited for the things people are able to build with this incredible technology.
I’ve come to the conclusion though that what people call “pairing” with an LLM is more the illusion of pairing than the real thing. Ultimately, it’s more of a siren than an active collaborator. It’s nothing but your thoughts, reflected back off of a bazillion vectors.
That’s not to say it’s necessarily a bad thing, but it’s not the same thing.
I still think we’re going to need to share our ideas and decisions with each other, and that the more frequently we do that the better. Maybe what’s changing is the scale of those decisions: instead of naming a method together, or deciding which test to write next, we’re trading off between architectural patterns, making product decisions about priorities, or designing the constraints we put around the LLMs to check their work.
As the speed of what we deliver increases, we have an increasing amount of cognitive load to bear just from keeping up with it all. Another reason we’ll need to keep the conversation alive.
Maybe the more we offload to machines, the more we'll need each other, after all.
Thanks to Ryan Spore for the siren metaphor, and to Ryan, Jon Fazzaro, Steven Diamante and Lada Kesseler for their feedback on earlier drafts of this post.