

2026-04-03
by Uri Walevski
Agents are genuinely amazing. I build them for a living and I'm still regularly surprised by what they can pull off. The ability to give a system a goal, a set of tools, and have it figure out the path on its own is a real shift in how we build software.
But as agents get more popular, I'm starting to see them show up in places where they're not the right fit. For example, using an agent to spin up a full test environment for every feature branch. That's a CI pipeline. The steps are identical every time, no judgment involved. An agent would work, but it'd be slower, more expensive, and less reliable than a script that does the same thing.
This isn't a criticism. It's genuinely hard to know where the line is. So I wanted to share some patterns and anti-patterns I've been noticing, mostly from building and watching others build.
I think about three questions before deciding.
Is the procedure known? If you can write down the exact steps before the task starts, a script is probably the better tool. Agents shine when they need to figure out the steps as they go, reacting to what they find, adapting. Deploying a service, syncing a database, converting file formats, these have known procedures. Writing them as code makes them faster, cheaper, and deterministic.
How many items? Agents are great for a single complex case. One bug to investigate, one document to analyze in depth. When you have 10,000 items to process, the overhead per item adds up fast. Each item gets its own LLM call, its own reasoning, its own latency. A script handles the batch in seconds.
Are the items independent? If item 47 has nothing to do with item 46, processing them in the same agent context can actually hurt. Details from one item leak into reasoning about another. The agent conflates two customers, mixes up two error messages. Independent items are better processed independently, which is trivial with code and awkward with agents.
When all three point toward an agent, meaning the procedure is unknown, you're dealing with a small number of cases, and the items are interrelated, that's the sweet spot. When any of them point away, code with maybe an LLM call in the pipeline is probably the simpler path.
An LLM call takes seconds. A script takes milliseconds. If your task is "run these 5 SQL queries and format the result," an agent will think about it, call a tool, wait, think again, format the output. A function does it in one shot.
It's worth being honest about this. You're paying per token for the agent to reason about something that might not need reasoning. For a user waiting on the other end, the difference between 200ms and 8 seconds matters a lot.
A script either works or it doesn't. When it fails, you get a stack trace. You fix it, it works again.
An agent might do the right thing 95% of the time and something subtly wrong the other 5%. It might format the output differently today than yesterday. It might skip a step it usually doesn't skip. These failures are hard to detect because the agent doesn't crash, it just quietly does something unexpected.
For tasks where the path is known and correctness matters, this is worth thinking about carefully.
Spinning up test environments. Docker Compose and a CI trigger. The steps are identical every time. An agent "figuring out" how to provision this on each run means you get slightly different environments each time, which kind of defeats the purpose.
Processing a batch of invoices. Each invoice is independent, same validation rules, same destination, same format. This is a map operation over a list. A script handles 10,000 invoices in seconds.
Syncing data between systems. Pull from Salesforce, transform, push to your warehouse. The schema is known, the mapping is defined. This is an ETL job.
Sending scheduled reports. Pull this week's metrics, format as a table, post to Slack. A cron job with a template does this deterministically, every time. An agent might decide to "improve" the format or summarize differently than last week.
This is where a lot of the confusion comes from. People use "agent" and "LLM pipeline" interchangeably, but they're pretty different things.
An LLM in a pipeline is a function. Text goes in, text comes out. Classify this email, extract a product name, summarize a document. The LLM does one bounded task, the pipeline moves on. No autonomy, no tool calling, no multi-step reasoning.
An agent is a loop. It receives input, decides what to do, takes an action, observes the result, decides the next action. It has tools, memory, and the ability to take multiple steps toward a goal that wasn't fully specified in advance. The agent chooses what to do. That's what makes it powerful and also what makes it unpredictable.
Many tasks that people build agents for are actually LLM pipeline tasks. Classifying 10,000 support tickets doesn't need an agent. A prompt, a for loop, and an API call will do. The LLM classifies each ticket, your code routes it. Adding an agent layer doesn't improve the classification, it just adds complexity.
The test is simple. If the LLM's output goes directly into the next step with no decision-making about what the next step should be, you have a pipeline with an LLM in it. That's often the right architecture.
The pattern that makes agents valuable is dynamic composition of known tools. The agent has access to a set of capabilities (search the web, read a file, call an API, write code, run a test) but the order depends on what it finds along the way.
A researcher agent searches for a topic, reads the results, decides the information is insufficient, reformulates the query, finds a better source, synthesizes. The tools are known. The sequence isn't.
A coding agent reads a bug report, looks at the relevant code, forms a hypothesis, writes a fix, runs tests, sees a failure, revises. Each step is well-defined. But which step comes next depends on the result of the previous one.
That's the defining characteristic: the continuation changes based on intermediate results. You can't write a fixed pipeline because you don't know at step 3 what step 4 will be.
Creative work fits this naturally. Writing, design, strategy. These are iterative processes where each draft informs the next revision.
Agents are also great when the process involves a human. A workflow where someone needs to review or approve something is awkward to encode as a script. An agent can do the prep work, present options, wait for input, and continue based on what the human decided.
The best architecture is usually a mix. The agent handles the parts that need judgment. Deterministic steps stay as regular code.
Your coding agent investigates a bug, reasons about the fix, writes the code. But the CI pipeline that runs tests? That's infrastructure. The agent triggers it, reads the results, but doesn't reinvent it on every run.
Your sales agent qualifies leads and writes personalized follow-ups. But the CRM update that logs the interaction? That's an API call. The agent decides what to log, the code handles how.
Agents are for thinking. Code is for doing. Keep them in their lanes and you get the best of both.
← All posts