SonarQube Remediation Agent

Designing SonarSource's first autonomous AI agent for code quality

Overview

Engineering teams catch thousands of code issues. They fix far fewer. Issues accumulate in a backlog that grows faster than developers can manually work through, and the time cost of resolving them one by one makes it structurally impossible to close the gap. I designed the AI agent that resolves them automatically.

I was the sole designer on it, starting from scratch with . The central question was not whether the technology could work. It was whether developers would ever trust it enough to let it.

There was no established playbook for agentic systems in enterprise developer tools at the time. I looked at adjacent fields: incident response tools that surface step-by-step execution traces, financial platforms where trust is built through audit trails, deployment pipelines where engineers have long been comfortable delegating to automation. None mapped cleanly. The patterns built on this project became the foundation for all subsequent AI agent work at Sonar because there was nothing to copy from before.

Challenge

Developers are trained to be skeptical of anything that touches their code automatically. Sonar's own research put a number on it: 96% of developers do not fully trust that AI-generated code is functionally correct, and than reviewing code a human wrote.

That number is the counterintuitive one. The assumption is that AI speeds everything up. But AI-generated code requires a different kind of reading: you are not scanning for syntax errors, you are verifying intent. Did the model understand the context correctly? Does this fix introduce a subtler problem somewhere else? That cognitive shift made the design challenge harder. We were not just surfacing a fix. We were asking developers to perform a new kind of review they had no prior training for.

The challenge was designing an experience where the agent could act independently while still giving engineers enough visibility and control to feel confident. Getting that balance wrong in either direction meant the product would fail: too opaque and no one would trust it, too manual and there was no point in having an agent at all.

Process

Mapping the unknowns

Before any wireframe, I ran an assumption mapping exercise with the team. We listed every belief we were building on, technical, behavioral, and trust-related, and ranked them by risk and verifiability. This created our research roadmap and prevented us from designing on top of unvalidated foundations.

Research with engineers

I ran moderated sessions with senior engineers and DevSecOps leads. The central finding was that trust in AI wasn't binary. Developers weren't rejecting automated fixes categorically. They were rejecting AI they couldn't reason about. If they could read what the agent did and understand why, they were willing to accept it. That reframed the design problem: the agent didn't just need to produce good fixes, it needed to make its reasoning legible.

Designing the agentic workflow

The core architectural decision was two distinct flows: PR fixes for new issues caught in active code reviews, and Backlog fixes for remediating accumulated legacy debt at scale. Each required different entry points, different trust signals, and a different mental model. Across both, the challenge was representing non-deterministic AI behavior in a linear UI and designing opt-in controls that gave engineers authority without making the agent feel unreliable.

Building trust through transparency

The review step only works when the developer arrives at it ready to review. That meant designing activation as a deliberate choice, not an automatic behavior. It also meant making the agent's reasoning legible at every step, so developers could evaluate what it did rather than just accept or reject an opaque output.

Solution

It took three versions to figure out the right shape. V1 was a status indicator: running, complete, failed. In testing, every engineer immediately asked what it actually did and clicked around looking for a log. V2 we put the full output into the PR comment directly. Too noisy, developers couldn't read the signal. V3 was a summary card and a separate fix PR they reviewed as a diff. That separation, summary for orientation, PR for scrutiny, was what made it work. No new UI to learn. The familiar artifact became the trust mechanism.

Early in the beta, the agent triggered automatically whenever a quality gate failed. Merge rates sat at 1-2%. The problem was not the quality of the fixes. Developers were receiving PRs they had not asked for and treating them as noise. After we introduced user settings and made activation a conscious choice, .

The lesson was that the review step only works when the developer arrives at it ready to review. Automatic triggering removed any sense of intention. Deliberate activation meant that when the fix PR appeared, the developer already knew it was coming and showed up to evaluate it, not dismiss it. The product is not just fix generation. The review is half the product.

For cherry-picking specific fixes, we initially explored building a selection UI inside the GitHub PR comment. GitHub's markdown sanitization made that impossible to do safely, and the timing between user selection and agent trigger created . So cherry-picking lives in SonarQube Cloud instead.

What looked like a constraint turned into a product argument. SQC already has filtering, bulk selection, and full issue detail. Developers who want granular control go there naturally. We are not asking them to learn a new tool, we are meeting them where the depth already exists. GitHub is the fast path. SQC is full control. Full control is our product.

Presentation

Use arrow keys to navigate

1 / 7

Impact

1,000 hrs

of technical debt cleared by one customer in 45 days

1% → 65%

fix PR merge rate after switching to intentional activation

AI Tech Award

Best Innovation in AI for DevOps, 2026

One customer cleared over 1,000 hours of accumulated technical debt in 45 days. Merge rates on agent-generated fix PRs went from 1% to 65% after switching to intentional activation. The backlog problem that had felt structurally unsolvable became measurably solvable. The product was featured in Fast Company and won the 2026 AI Tech Award for Best Innovation in AI for DevOps.

Research sources

Sonar State of Code: Developer Survey Report Full report PDF

In the media

2026 AI Tech Award: Best Innovation in AI for DevOps→

Fast Company: AI Code You Can Trust→

Join the SonarQube Remediation Agent Beta→

Ask the AI

Next Project

AI CodeFix →