AI Agents for Code Refactoring: Ambitious Architects Who Need Your Wisdom by Ed Lyons
Image by Ed Lyons via Midjourney
AI coding tools are revolutionizing software development, speeding execution, and improving pattern use. Employing agents for the refactoring of new and legacy code is incredibly powerful, but there are important considerations, risks, and opportunities that are not present in human refactoring efforts.
I learned this by doing two weeks of refactoring experiments on a real-world codebase with Claude Code.
I came away with the following conclusions, which I will review in detail.
There is a difference between an agent talking about refactoring, and doing it
It is better to work on the refactoring plan instead of patching the refactored code
Use more costly agents for planning, and lesser ones for implementation
Beware confirmation bias of your refactoring ideas
Know the tangible benefits you seek to achieve
Take advantage of temporary and partial refactoring opportunities
Think of potential team impact and merging problems
Keep architectural guidelines in your context window
Take time to review refactored code as a learning experience
The DIFFERENCE between agent talk and agent refactoring
When you ask the agent for refactoring ideas, it sounds like a software architecture expert, even though it is widely believed these tools are not good at architecture. Agent coding has led me to the same conclusion. So how can this be?
I have learned that an agent describing a refactoring opportunity is the result of the LLM swallowing many documents on architecture. Yet the coding changes for those recommendations seem to be done by a lesser architect.
In one experiment, I had two different Anthropic models (Sonnet 4 and the superior Opus 4) perform a significant refactoring task. For the same instructions, Opus put out a much smarter and more detailed set of messages about what it was doing than Sonnet did. I expected it would create better code. Yet the coding changes from each were identical.
This mismatch between agent talking and coding is crucial to recognize. The way to handle not being able to trust what it promises you it is by forcing it to be far more specific about how it will do the work.
Create a refactoring plan
Sometimes, an agent refactoring works great the first time. Other times, it needs repairs. There are a few times when it fails completely, just like a human refactoring where the code becomes a mess, and there are undesirable side effects.
One way to dramatically improve the results is to avoid doing the refactoring right away. Instead, ask your agent for a step-by-step plan that you can review. Claude Code in particular is very good at creating plans. Then, if you like it, ask it to execute that plan.
Creating a plan has more than one benefit. You can see the changes that will be made, you can notice an undesirable side effect before it happens, and make changes in the plan. Also, while looking at plans for major changes, you may see small changes worth doing on their own. I sometimes saw this, and would ask the agent to just do those changes first. This reduced the amount of code being changed in the larger refactoring.
Like any large-scale changes with agents, having it make git commits for each step helps you keep control, and lets you go back in time to take a different path. You should also make it stop and wait after each step so you can look things over, instead of giving it permission to do everything on its own.
Overall, I have found that asking for plans significantly improves refactoring results.
Model strength and implementation
I wondered how important it was to ask for a detailed plan. So, in two different branches, I asked Sonnet 4 and the much-more-expensive Opus 4 model to look at a large, complex refactoring idea, and for each to give me their plans to do it. As expected, Opus had a much more detailed plan and found small things that needed to be changed that I might have missed. Sonnet was good, but missed a few things Opus did not. I had each one execute on their plans and I checked the results. Opus was significantly better.
But what if each one tried to execute the other’s plans?
I had Opus execute the lesser plan from Sonnet. It created better code than Sonnet did with that plan, adding in a few details that were not in the plan. But the code wasn’t quite as good as when Opus performed on its own plan. However, Sonnet took the superior Opus plan and executed it in a nearly identical fashion as Opus did; there was only one insignificant difference in the code.
So plans matter a lot!
And for those who are cost sensitive - and the agents are only getting more expensive as time goes by - I would recommend having a higher-end model come up with a detailed plan, and then perhaps have a less expensive one do the work. It felt just like any other kind of engineering or general contracting work: if you specify enough details, you don’t need senior people to get it done properly.
Beware of confirmation bias about your ideas
The advice that an agent can offer, and all of the console messages you get about progress in the plan, make it easy to believe that you are listening to a human architect. You need to remind yourself that an agent does not have opinions about your project, and has no architectural convictions.
One time, I followed an agent recommendation to normalize my data structure. It seemed like a good thing to do. When it was done, it printed out the benefits of normalization. I thought, “Great! I made the right decision.”
Later in the day, I decided that it was not a good idea. It changed it back, and then the agent printed a message explaining the benefits of denormalization. What? It was then I realized that when you approve an architectural change, it tells you the benefits of that kind of change. It is not going to weigh tradeoffs unless you ask it to.
This leads to the risk of confirmation bias in your refactoring - a condition in psychology where someone welcomes an idea they agree with uncritically, when they would have been more skeptical of an idea they did not like.
So if your agent gives you a recommendation you like, think about it. If you put in your own idea and it tells you what the benefits are - you should not assume it was the right choice based on that confirmation alone.
Know the tangible benefits you seek to achieve
As agent refactoring drastically lowers the effort in doing the work, it is tempting to do all kinds of things that were not realistic before. I have certainly avoided refactorings in my career because they would take too long or would cause too much disruption. But with agents, the cost barrier is quite low, and a long list of refactoring ideas from the agent contains many temptations. Yet you must consider the benefits, not just potential compliance with standard practices.
For instance, one of the recommendations I got from Claude was to change the way my data store was accessed. The agent told me that a significant code change would result in reducing an o(n) search complexity to o(1). That sounded good. I made the change. Yet I later undid it. It changed the code too much, and when I looked closely, there were very few items in that part of the data in question, so looking through all of them was an insignificant delay.
It was one of a few times I did a refactoring and decided afterwards that my specific application did not get much benefit. I resolved that I had to know ahead of time what specific benefits I wanted before doing the work. For example, I put in performance timings in order to judge whether a refactoring would show a significant benefit in speed.
Take advantage of temporary and partial refactoring opportunities
As agent refactoring does make it far faster to make changes, it can be used to do things that previously would not have been considered worthwhile. For example, you can do temporary refactorings and then undo them later.
As an example, I came across a situation where I was creating many instances of one large type of object, yet most had significant differences. There was very efficient code that my agent had written that sliced up parts of the objects for customizations.
Yet I was having trouble comprehending the state of each type of object, and while developing the system further, I found I was missing required properties. The agent was good at making changes in this area, but not perfect, so I still needed to understand that code.
So I decided to have my agent create a separate function for each object, and put duplicate common code in each one. This added a lot of lines of code, but helped me see what was missing and made customizations easier. I know that when I am done, I can just ask the agent to refactor back to the original functions that were harder to read. I would not have done this kind of refactoring by hand.
Low-cost refactoring can now be used for legacy code situations where previously, no one wanted to invest in improving old code. But with an agent, tactical refactoring of older code is a viable option in pursuit of larger architectural change. As someone who has improved old applications by hand that were meant to be phased out, I welcome this new ability.
You can also consider partial refactorings. When you ask the agent for detailed plans for different high level ideas, you can look across them and see steps that are valuable on their own. In my experiments, there were a few times when I implemented pieces from different plans that I felt had strong independent benefits, even when the larger refactoring plans had uncertain value.
Think of potential team impact and merging problems
Just like human refactorings, you have to consider what impact your changes will have on the project, and what other team members would think. But never before has so much change been possible between standup meetings! I once spent a few hours doing a lot of bold refactoring ideas, and then realized the resulting pull request would be so massive and disruptive, that it would be very hard to review. It would certainly make it too difficult for anyone else to merge their work, no matter how valuable the agent made the changes sound.
For large potential changes, it is best to come up with refactoring plans and share them with the team and discuss their value. Then you can plan how they would get merged.
Keep architectural guidelines in your context window
Coding agent experts take time to put a lot of instructions and guidelines into places where the tool can see them for every prompt. (In Claude Code, this is the claude.md file). For refactoring, this is also important. If there are reasons why your architecture is set up in non-standard ways, you should explain why in your context window and constrain choices. You can also ask it to run certain tools and tests as part of its analysis. Having the team share a context window file was already a good idea, and it could also prevent someone’s unwise refactoring idea from becoming a pull request.
take time to review refactored code as a learning experience
On a large refactoring affecting many files, I do not merely spot check and run tests. I make time - sometimes in the evenings after my children all go to bed - to have a focused experience just looking at what the agent had done.
I find these focused reviews give me get a feel for how Claude refactors, and I believe they help me improve what I ask the agent for. This isn’t a science, and I can’t explain exactly what I have learned, but it has improved my state of mind while using agents and reviewing coding results. Sometimes, I will refactor what it did a little, making changes here and there. But most of all, it is an excuse to take a closer look at a lot of code I did not write myself.
We need to get a feel for these agents beyond just learning how to use them, in the same way that artisans get a feel for their tools. Refactoring with agents is not the same as what we have done for a very long time. We need to train our minds to use this new power wisely, and in the direction of excellent software. If you haven’t tried it, you probably won’t get it right every time. I certainly didn’t. But you can refactor your techniques later.