Agentic Coding with Claude: The Bad Parts, Part 1 – Lack of Team Buy-In

An illustration of a dev team disagreeing about AI

Hi, I’m Brian. You may remember me from such posts as Using Claude Code to Automatically Fix Security Issues Discovered by Snyk, and Teaching Claude Code About Test Coverage. I probably sound pretty bullish about the future of AI augmented coding.

And boy howdy, am I ever. My team maintains a legacy product, which means often our daily tasks are not fun, or creative. AI tools are helping me automate away the drudgery so we can focus on the interesting, creative stuff again.

But this post isn’t about that. It’s the start of a discussion about the parts of using AI for software development that influencers seem not to want to talk about.

This is the first in a series of posts about problems I’ve encountered while using AI tools for software development, and strategies we’ve used to mitigate the issues.

The Problem: Lack of Team Buy-In

If you’re team has a profound disagreement about if/how to use AI, things could get messy.

And if you put 2 or more developers in a room, there’s a good chance they don’t agree on this topic. Not all developers feel the same way about using AI in software development. You’ve got a spectrum of opinions, some of which I’m about to poke fun at a bit.

The “If AI Were Jonestown, I’d Be Dead and Diabetic” Types. These folks drank the Kool-Aid and asked for seconds. Bro, AI is changing everything bro. Bro in 6 months software developers won’t have jobs, bro. Bro the CEO just has to describe what he wants to Replit and hope it doesn’t delete the prod database, bro. Also why didn’t you buy my NFT’s two years ago, bro? BRO???!
The Pragmatists. We love software development. We like having jobs in software development. We like tools that make our jobs and our work better. Maybe we should learn to use this thing but figure out when it is, and isn’t, the right tool for the job.
The AI Amish. AI is the devil! AI will destroy the environment, tank the economy for all but the 1% of the 1%, atrophy the skills of human developers, erode privacy, and bring shame on you, your family, and your cow! But also it’s completely useless and produces awful slop.

Let’s face it. When you start using AI tools, the code you submit has a scent of AI on it. That doesn’t mean it’s bad code. It means that folks familiar with your codebase sense something slightly off about it. They know it’s isn’t the artisanal, hand-crafted slop you usually submit to PR. It’s AI slop!

If someone isn’t aligned on if, when, or how to use these tools, they may look for every reason to criticize and block your code.

Possible Solutions

I live professionally in the quality consulting space and they often say “there are no people problems, only process problems.”

<rant> (But that’s definitely a little bullshit, isn’t it? There are totally people problems. I mean every company has a department specifically for people problems, at least for now. So don’t tell me there’s no such thing as people problems when it’s clearly a marketable skill to be able to ruin someone professionally without shifting your facial expression.) </rant>

Sometimes we can solve the alignment problem by building consensus and turning skeptics into cheerleaders by showing the way, and through crucial conversations. Sometimes we need to use the power of policy to draw some boundaries around what proper use of the tool is, and what level of disagreement will, or won’t, be tolerated. Sometimes we solve people problems by choosing to work with different people.

I can’t say I’ve solved this completely, but here are some strategies I’ve found somewhat effective.

Demonstrate Value

Obviously we’re going to start with Step 1 in the D.E.N.N.I.S System. Use generative AI to go faster while being better. After all: if you can’t do this, then what’s the point? Most of all don’t submit slop for review just to demonstrate velocity. You’ve just fueling their narrative.

(And yes, I’m guilty as charged on not taking my own advice a few times.)

Be Clear About PR Feedback Expectations, Be Brutal in Enforcement

If your team does peer review (PR), have a PR Policy that defines what appropriate and inappropriate feedback and change requests look like. When folks abuse the policy to block AI generated code they don’t personally like, fix the policy. When they block AI code they don’t personally like in defiance of the policy, rub it in their face.

And be honest when AI code passes your policy and is still slop. Improve your prompts to improve your code. And improve the policy where it falls flat.

Build the Team You Want

If AI is that important to adopt, then build the team you need to make adoption successful. Don’t suffer folks that will throw sand in the gears of the effort. Brutal, but honest.

Using Claude Code to Automatically Fix Security Issues Discovered by Snyk

My team has been working on a security remediation project for the last 8 months. At the beginning of the project we asked ourselves the question: could generative AI help solve this problem faster? And six months ago, the answer was probably not. Out experimentation with AI at that point suggested that none of the tools at the time could do a great job at working on the languages we use, in the shape much of our legacy code is in.

Today, that’s changed completely. In particular, I find Claude Code incredibly adept at working with legacy, especially when I don’t ask to much of it at one time.

I’ve been using it successfully to point it at a file, and ask it to remediate security issues such as SQL injections and Command injections. I’ve written custom slash commands to make it consistent and repeatable. But it’s still a manual process.

Today’s challenge: scale it up by letting Claude Code talk to Snyk, our secure code auditing tool, and let it spin up asynchronous subtasks which fix specific issues, and submit pull requests.

Talking to Snyk

I am having a heck of a time getting Claude Code to talk to Snyk in spite of the fact that:

Snyk has a good CLI
Snyk has preview MCP support

My issues seem to stem from the fact that we have an organizational account, not an enterprise account, and API (and therefore, MCP) access is limited unless we want our bill to explode. After working with Snyk the 8 months, this is my biggest gripe about it: their pricing structure and feature set for small teams… kinda sucks.

The Workaround: Generate Results Offline

Fortunately our account type and bank account balance don’t need to be a roadblock. I worked around the limitation by generating the results offline rather than having Claude get them via API in real-time. (This also happens to speed up future steps and reduce the number of prompts we need to get data iteratively in future steps.)

snyk code test --json-file-output=snyk-output.json

Now we have a JSON file that lists all the issues detected by Snyk.

Getting Claude Code to Read the File and Remediate Issues

What I want Claude Code to do is, well… exactly what I’d do were I to do the work myself. Most of our security remediation work is done with a scalpel, not a chainsaw. We fix the specific lines of code that have specific issues, and don’t make big changes unless they’re required. The process looks like this:

Pick the next vulnerability from the file.
Create a branch.
Fix the vulnerability.
Run all quality checks against the change (linters, tests).
Fix any quality check issues.
Stage, commit, push.
Submit PR

My prompt pretty much says that:

The file snyk_output.json is the JSON output of running "snyk code test," so it describes security issues Snyk found in out codebase. Go learn about this file format so you understand what the issues are. Then, for each issue start a sub task which creates a new branch, fixes the issue, runs all quality and testing commands available and, once they all pass, stages, commits, pushes the work, and then opens a PR which explains what was fixed and how to test. Stop at ten remediated vulnerabilities.

In other words: do what I would do, and stop at ten iterations. I build the limit into the prompt so if something goes wrong, or the automation can be improved, I want time to iterate before we go nuts with it.

The Results

Claude did a great job. I had to approve changes frequently, but I got the PR’s I expected, in more or less the shape that I want. My next steps will be:

Refine the prompt to make Claude include the Snyk issue link in the PR.
Refine the prompt to use Jira’s CLI or MCP to create a matching issue.
Refine the prompt to make Claude include the Jira issue ID in the PR.
Refine the prompt to include a test plan in the PR description and Jira issue.

Once things things are complete, and I test a few more PR’s, I’ll run with --dangerously-skip-permissions and go full auto mode.

What I Got Out of Agile + DevOps East 2023

An image generated by DALL-E to methophirically represent the blending of the concepts of agile software development and artificial intelligence. Shows a humanoid figure made of code, surrounded by circuitry, and symbols evoking agile and devops processes

I’ve never gotten the chance to attend a tech conference in person. Maybe that changes next year. This year, I got the chance to virtually attend Agile + DevOps East.

What I wanted out of the conference isn’t exactly what I got. As a baby engineering manager and aspiring agilist I hoped to gain a better understanding of agile software development and DevOps, and take some lessons home to implement. What I got was a little confidence that I am already on the right path, and a whole lot of prognostication about AI.

But it was useful.

I’ll start with key take-aways that I pulled from the conference, and then move into a summary of each of the talks I attended.

My Key Take-Away’s from Agile + DevOps East 2023

At a high level, this is what I took away from Agile + DevOps East 2023

The software industry is losing it’s religion when it comes to agile.
We still care about the foundations of agile. Those things the manifesto sought to correct. But people are tired of the systems, and the rigidity. Agile is in it’s reformation.
The industry has long-since moved on to DevOps and Continuous integration. Which to me looks a lot like “agility in practice.”
Software teams should be self-contained: all the expertise required to deliver working software should be on the team.
The AI is coming! But coming to make us more efficient, not replace us.

AI-Powered Agile and DevOps

This talk by Tariq King used, of all things, the evolution of Super Mario Bros. to demonstrate how and why software development processed have evolved to improve quality, efficiency, and flexibility as the the industry mature. Though sometimes the metaphor was a little stretched, it made the point and definitely spoke to my inner 80’s kid who’s still playing Mario games with his kids!

We’re currently in the Agile + DevOps era. While we don’t know what the shape is going to be exactly, we can be pretty certain AI is going to shape what’s next.

But the attributes of current AI make it are tuned to make us efficient, not accurate. We need to use the tools ethically and intelligently to shape positive outcomes.

Key Take-Aways

Software models change. We’re at the Agile + DevOps age right now. AI will influence what is next.
AI will speed up all the processes in our development lifecycle including product management, requirements analysis, development, testing, and release.
But productivity cannot be measured in quantity alone. Productivity combines quantity, efficiency, and quality.
We need trustworthy AI tools. Otherwise, we risk using AI to deliver garbage faster.
To deliver quality, AI needs to be testable, controllable, observable, and explainable. These are attributes the current iteration of AI lacks.
Don’t build on assumptions. Form a hypothesis, test, and verify.

A Minor Point of Disagreement

One of the slides in Tariq’s presentation offered an example of how AI can help make development processes more efficient. A business analyst used AI to convert chat transcripts to user stories.

In my experience that example misunderstands where the valuable work is being done. The real work was facilitating a conversation and asking the right questions to introspect the problem. All the AI did was crunch words into a different format. Which is, in fairness, valuable. It eliminated grunt work. It did not eliminate or even speed up the real work of the analyst.

In my experience, AI has not been great at processing raw transcripts of conversations. Conversations have a pace and cadence that can be hard to parse. People speak in fragments that are clear in the moment but sound fragmented when converted to a raw transcript. There is unspoken information and context in the negative space of the conversation that tools cannot help to capture.

How AI is Shaping High Performance DevOps Teams

Vitaly Gordon’s talk was mostly about measurement. He makes that point that engineering is often the least managed function in an organization (based on context I think he really meant to say least measured). In DevOps, we should be measuring the health and productivity of our team and product (DORA metrics are one example).

In the future, AI can help us measure and improve these metrics.

Key Take-Aways

Engineering is often one of the least managed and measured functions of a business.
To reduce lead time we should reduce wait time. In other words, measure and identify blockers like slow PR approval, and figure out how to eliminate the blockers.
Use automated testing to reduce Change Failure Rate.
Use AI to generate more test coverage.

DevSecOps in a Bottle: The Care and Feeding of Pocket Pipelines

Jennifer Hwe’s talk focused on how her team improved security, maintainability, and delivery by bringing DevSecOps practices into an organization with a lot of complexity. Her team was charged with implementing DevSecOps, CI/CD, and containerization on a legacy product that required a focus on heightened security practices, and had to serve multiple teams that were previously working in silos, on separate networks with their own ops and security processes.

Key Take-Aways

Innovation was being held back by lack of DevSecOps automation. They couldn’t deliver new features quickly because manual processes held them back.
When you plan to implement DevOps or any Dev*Ops variant, you’ll likely cut across various parts of the organization with different cultures, and different opinions on how things should be done. Be prepared to identify and address both technical and cultural challenges.
Taka a phased approach.
Change is slow. In an organization as large as Northrop Grunman, their transition was measured in years.

Lead Without Blame

This talk by Tricia Broderick felt like the philosophical sibling of Sarah Drasner’s Engineering Management for the Rest of Us. It’s all about the fact that organizations often hire up from the developer pool into management, but does not prepare former individual contributors for their new role. This talk felt like the missing manual.

Key Take-Aways

As a technical manager don’t write code because you’re good at it, or because it’s your happy place. That’s not your job anymore.
“Sitting together” doesn’t make you a team. That just makes you a group. Health collaboration makes you a team.
Individuals can win while whole teams and projects fail. That’s still a failuree.
Transition yourself out of the “hub” of operations. You’re not that important. You’ll bottleneck productivity and team growth if you stay there too long.
Don’t focus so much on individual accountability.
Focus on building team members who are responsible, motivated learners.
Conflict good. Drama bad.
Further Reading: Lead Without Blame

The Potential of AI and Automated Testing, Conquer Test Script Challenges with AI

This talk by Jason Manning, Nyran Moodie, and Orane Findley was more of a high-level, open discussion about how AI has and will continue to change software testing. They discussed some of the pitfalls we need to be aware of as we build more reliance on AI to build tests and perform automated testing.

Key Take-Aways

AI can help you get data-driven metrics about your product (but didn’t really dive into “how”)
It may be possible for AI to scan web pages and generate tests for you (again, “how”)
We need to consider risks to privacy and security as we plug AI into our products, our tests, and our intellectual property
Consider how to use AI without sharing sensitive data or IP
At this point, a human needs to be involved in order to ensure the results of AI-driven processes are accurate and secure.

We Got Our Monolith to Move at Light Speed

This talk by Corry Stanley and Marianna Chasnik hit a bit close to home for me. It was all about how they moved a legacy monolith at Discover Financial from a “few releases a year” to a two-week release cycle. Sounds a lot like the journey I’ve been on. Discover succeeded by bringing Ops skills into the product team, using modern tools, infrastructure, and techniques to drive productivity, release faster, and reduce defects.

Key Take-Aways

The product team needs DevOps skills built-in
Train your whole team in DevOps
DORA metrics are lagging indicators of health
Treat Pre-production (staging and test environment) failures as production failures. Act accordingly, and act fast.
Avoid broken baselines. Use tools and processes line standardized branching models, automated deploys, automated quality tools, automated testing, and branch protection rules to shift quality and validation as early in the process as possible.

The Art of Getting Less to be Faster, Smoother, and Better – Embracing the Agile Principle of Simplicity

Robert Clawson’s talk was near and dear to me as the head of a project that suffers from a legacy of organic, unnecessary complexity. Robert advocated for achieving simplicity and productivity by maximizing the work not done.

Key Take-Aways

People and time are finite.
Our incentive structures rarely reward subtraction, even though subtraction can be an incredibly intellectual, creative, and valuable endeavor.
Features “not worked on” are valuable. It means you saved your resources, or chose to use them to do something with more value.
Sometimes removing something is the most valuable thing you can do. Clawon’s example was the K-brick which optimized cost and materials without sacrificing structural integrity.
Look for opportunities for reuse. What do you have? How can you reuse or further capitalize on it without adding complexity?
Further reading: Subtract, The Untapped Science of Less

AI and the Future of Coding

Christopher Harrison from GitHub gave a refreshingly down-to-Earth talk about the current and future state of AI.

Generative AI is an enhancement to software development that can make us faster, but AI cannnot write full applications, write perfect code, or replace developers.

In experienced developers risk shipping bad code by over-relying on AI and not understanding the results it generates.

Experienced developers driving AI can use it to work faster, reduce the pain and time engaged in unpleasant tasks.

Key Take-Aways

Automated code review is coming. But don’t forget about other automated tools like GitHub Actions to automatically check security, code quality, etc.
AI can help with unpleasant tasks like writing unit tests.
AI can help with uncommon syntax, like figuring out regular expressions.
AI can help you rapid prototype and experiment.
But AI can’t help you write good code if you don’t already know how to write good code.