The State of Coding Agent Models: August 2025 by Dakota Kim
In the past year, coding agent models have changed a great deal, and they are becoming a powerful part of how software gets built. If you work at a company that writes or maintains code, you may soon be working with one of these agents yourself.
This post will walk with you through three ideas:
How agentic tooling works as the way you interact with these models.
How to think about the model that serves as the “coding brain.”
How to look at benchmarks and leaderboards so you can make a fair comparison.
1. Agentic Tooling as the Interface
A year ago, most people talked to a coding model in a single turn. You asked for code, it gave you code, copy, paste, and that was the end of the exchange.
Today, more and more tools use what is called an agentic approach. This means the model can take several steps in a row and execute actions in service of a goal, just like you would if you were solving a problem.
In these tools, the agent can:
Plan the work by breaking a big task into smaller ones.
Write code and run it to see if it works or not.
Fix mistakes it finds along the way, and optionally loops on error.
Connect to tools you already use, such as version control, your editor, or your testing setup using MCP (Model Context Protocol).
When this works well, the activation energy to ship is lowered and you get to spend more of your time on the parts of the work that need your judgement and creativity.
2. Choosing the “Coding Brain”
Every coding agent has a model inside it that makes decisions and writes code. This is what I am calling the “brain.”
Largely, there are two kinds of brains to choose from:
Open-source models. Examples include the recently-launched gpt-oss variants and fine-tuned versions of Llama 4. These are built in public and can be changed to suit your needs. They work well for teams who want control and the option to make their own improvements.
Proprietary models. These are offered by companies like Google, OpenAI, and Anthropic. They often work very well without much setup. They can handle large amounts of code and a variety of tasks. They are simple to start using, and these offerings allow a fair degree of fine-tuning and evaluability even through the user interface.
The right choice depends on what kind of work you and your team do, whether you’d like to own hardware and the MLOps or not, and whether you want to control the model yourself.
These models are consistently at the top of the latest benchmarks and leaderboards:
Proprietary
GPT-5
Claude 4 Opus
Gemini 2.5 Pro (Code)
Open-Source
gpt-oss-120b / gpt-oss-20b
Qwen3-Coder
LLaMA 4
Kimi K2 Instruct
3. Benchmarks, Leaderboards, and Choosing Fairly
It can be hard to know which model will work best for you. One place to start is by looking at benchmarks and leaderboards.
Benchmarks are tests where the model is given a set of problems to solve. These problems might be things like fixing a bug, adding a feature, or writing a function from scratch.
Some of the most useful benchmarks today are SWE-bench, LiveBench, and the leaderboards from Aider and LM Arena.
Different benchmarks use different rules. They may use different data, different measures of success, and different amounts of help from humans.
Because of this, it is best not to rely on only one.
New models are released with a regular cadence, and this presents an opportunity to periodically review these benchmarks and leaderboards (and any recent studies from trusted sources) for updated information.
4. Where?
Another choice is where the model runs.
If you use a vendor’s cloud service, you can get started quickly and do not have to manage the hardware yourself.
If you run a model on your own systems, you have more control, and you may save money over time. This works well with open-source models, but you will need the skills and equipment to support them.
5. Conclusion
Coding agent models have moved from the edges of software development into the center of it. If you take the time to learn how they work, read benchmarks carefully, digest the model cards, choose the right brain, then you can make an informed choice that will help your team work smarter.
Check more than one source, try the models that stand out, and see how they fit into your way of working!