AI is writing more of your code than ever – and breaking production faster than your teams can keep up. Here’s what that actually means, and what to do about it.
There was a survey published this week that made us sit up. CloudBees asked more than 200 enterprise technology leaders about the code their teams are shipping, and 81% of them said AI-generated code had increased the number of production issues they’re dealing with. Not coding bugs caught in review – problems that surfaced after the code had already gone live.
Here’s the part that really stood out, though. In the same survey, 92% of those leaders said they were confident their code was production-ready before it shipped. So you’ve got the vast majority confident things are fine, and the vast majority simultaneously reporting more things going wrong. Both of those can’t be comfortably true at once, and the gap between them is where the real story is.
That same week, Gartner put out its own assessment of the market for AI coding agents, describing it as entering a new phase of expansion and competitive realignment. Their headline prediction: by 2027, more than 65% of engineering teams using agentic coding will treat the IDE itself as optional, handing control and validation over to automated platforms. The tools are getting more capable and more autonomous, fast. The question nobody’s answering quite as loudly is whether the way we check that code is keeping pace.
Short answer? It isn’t.
More code, less certainty
Let’s look at what the CloudBees numbers are actually telling us. Respondents said 61% of their code is now either generated by AI or written with AI assistance. 64% of those engineering organisations have AI widely or fully baked into their workflows. And just over half – 52% – report a genuine uptick in how much software they’re producing.
So far, so good. That’s the promise of AI-assisted development delivering. The trouble starts when you look at what comes out the other end. 69% cited security vulnerabilities introduced specifically by AI-generated code. 63% flagged compliance issues. And 70% said maintaining their test suites has become a bigger burden than writing the code itself.
Think about that last one for a second. The writing got easier, so the checking got harder. We’ve effectively moved the bottleneck rather than removing it – and we’ve moved it to a part of the process most teams were already stretched thin on.
One of the security specialists quoted in the reporting put it well: when failures show up after deployment, it’s a sign the validation process itself isn’t keeping up with what the AI is producing. The code passed every review gate that existed and still broke things. That’s not a story about bad code. It’s a story about review gates that were designed for a slower era.

Why your existing process is creaking
The way most teams review and release software was built around a fairly reasonable assumption: a human wrote this, a human understands it, and another human can sensibly check it. Pull requests, code review, a test suite, a deployment gate – the whole pipeline assumed the volume of change was roughly bounded by how fast people could type.
AI broke that assumption without anyone formally deciding to change the process. When 61% of your code is machine-assisted and your output has jumped by half, the same review pipeline is now being asked to verify far more, far faster, with the same number of people. Something gives. Usually it’s the depth of the review.
The result is a verification gap: AI generates code faster than teams can validate it, and the difference quietly accumulates as risk that only becomes visible once it’s in production.
And here’s a detail that ties it all together. 93% of organisations in the survey said they have a formal process for reviewing and releasing AI-generated code. Sounds reassuring. But only 56% said that process is always enforced. A process that’s followed roughly half the time isn’t really a safeguard – it’s a document. When the pressure’s on to ship, the optional step is the one that gets skipped, and the optional step is precisely the one catching the problems.
The bit that costs you twice
It’s tempting to think of this purely as a quality problem, but there’s a financial sting too, and it’s a sneaky one.
More code means more to test, more to scan, and more to run through your pipelines. 54% of respondents said their CI/CD infrastructure spend had risen significantly over the past year. 53% flagged rising testing, security, and deployment costs. That makes sense – if you’re producing more software, the machinery that checks and ships it has to work harder.
The worrying part is how few teams can actually see where that money’s going. Only 31% of AI-related spending could be linked to specific business results. In 36% of organisations, AI spend is either tracked without any measure of return, or not tracked at all. And only 45% described their costs as predictable from one quarter to the next.
So you’re potentially paying more to verify code you’re less certain about, while struggling to prove what any of it is delivering. That’s the kind of thing that looks fine right up until someone on the board asks a pointed question about ROI, and suddenly nobody has a clean answer.

Whose problem is this, anyway?
This one genuinely surprised us. When something built with AI assistance fails in production, who owns it? According to the survey, only 12% of organisations have a dedicated AI governance function. For nearly half – 46% – the buck stops with the CTO or VP of Engineering. For another third, blame lands on whichever engineering lead happened to be near the tool.
What that says to us is that AI has been adopted as a productivity tool but not yet absorbed as an engineering discipline. The capability arrived faster than the ownership did. And without clear ownership, the governance everyone agrees is important keeps being everyone’s job and therefore nobody’s.
What we’d actually recommend
None of this is an argument against AI-assisted development. The productivity gains are real, the adoption isn’t reversing, and we’re not in the business of telling people to put a genuinely useful tool back in the box. The point is narrower than that: the speed is here, and the discipline needs to catch up to it. A few things we’d suggest.
Move your quality checks upstream. If problems are only surfacing in production, your safeguards are sitting too far downstream. Automated testing, security scanning, and compliance checks need to run as close to the point of generation as possible – not bolted on at the end where they’re easy to skip under deadline pressure.
Make the process mandatory, not aspirational. A review process enforced 56% of the time is the same as no process on the days it matters most. If a step is genuinely important, it needs to be built into the pipeline so that skipping it is harder than following it.
Give AI governance an owner. Someone needs to be accountable for how AI-generated code is validated, secured, and released – not as a side responsibility, but as a defined role. Diffuse ownership is how things fall through the cracks.
Treat test maintenance as core engineering work. If verification is now the bottleneck, that’s where your skilled engineers should be focused – on building the automated checks and architectural guardrails that let the AI run safely – rather than treating testing as the unglamorous job that gets squeezed.
The teams pulling ahead aren’t the ones generating the most code. They’re the ones who’ve built the discipline to trust what they ship. The model isn’t the hard part anymore. Knowing your code is safe to release is.

Q&A: AI-generated code and the verification gap
Is the answer just to slow down our use of AI coding tools?
We wouldn’t recommend it, and realistically most teams can’t. The productivity gains are real and your competitors aren’t slowing down either. The better move is to let the generation run fast while making the verification faster too – automated checks upstream, enforced gates, and clear ownership. The goal is to close the gap, not to throttle the thing that opened it.
Our review process looks solid on paper. Why would that not be enough?
Because a process is only as good as how often it’s actually followed. In the CloudBees survey, 93% had a formal process but only 56% always enforced it. The gap between “we have a process” and “we enforce it every time” is where most production failures live. If skipping the step is easier than following it, it’ll get skipped exactly when you can least afford it.
If the code passed review, how is it still failing in production?
Because the review was built for a slower rate of change. When most of your code is AI-assisted and your output has jumped, the same review pipeline is verifying far more in the same amount of time. Depth gets sacrificed for throughput, and the things that slip through are functional defects, security holes, and compliance issues that only show up once they’re live.
How do we get a handle on the rising costs?
Start by making them visible. A large share of organisations either don’t track AI-related spend against outcomes or can’t predict it quarter to quarter. You can’t control what you can’t see, so the first step is connecting AI and CI/CD spend to specific results. Once you can see where the money goes and what it returns, the decisions about where to optimise become a lot clearer.
Where should we start if we think we have this problem?
Look at where your failures are surfacing. If issues are routinely appearing in production rather than being caught earlier, that tells you your safeguards are sitting too far downstream. From there it’s a question of moving checks upstream, hardening your pipeline so the right steps can’t be skipped, and deciding who owns the governance. An honest look at your last few production incidents usually shows you exactly where the gaps are.
How Vertex Agility can help
The challenge we’ve described here – output racing ahead of the discipline needed to trust it – is a conversation we’re having with engineering leaders right now. The specifics differ from one organisation to the next. Some are watching production incidents climb and don’t yet know why. Others know exactly where the gap is but don’t have the capacity to close it without pausing delivery.
Our Software Consultancy and Platform Engineering work sits squarely on this problem. We help teams build the pipelines, automated testing, and architectural guardrails that let AI-assisted development run at speed without the production failures and runaway costs catching up with them. If your output has gone up but your confidence in what you’re shipping has gone down, the gap is almost always in the verification layer rather than the tools themselves.
Our AI Consultancy practice works alongside this on the governance side – the frameworks, ownership, and responsible-adoption structures that keep AI an asset rather than a liability as you scale. Because we combine senior oversight with the same automation and AI tooling our clients use, we’re able to compress timelines and reduce rework while keeping the architectural integrity of what you ship intact.
If you’d like an independent view of where your delivery process stands, we’d recommend starting with one of our free self-assessments. And if you’d rather talk it through directly, please use the button below to see how we can help.