Stop Putting Agents in the CI

May 2026

CI is great for short, one-shot agent tasks but falls apart for stateful, multi-step workflows — runners vanish, retries destroy context, and cross-system coordination becomes duct tape. For real agentic work, use a cloud automation platform like Cursor that treats the agent as the orchestrator and handles the runtime for you.

Last week, I had a chat with an engineer that wanted help with an automation he had put in place for maintaining dependencies around some repo he owned.

He had a problem most teams have: a steady stream of open GitHub code-scanning alerts that nobody has time to triage by hand. So he built an automation that does it nightly. Roughly:

flowchart LR
    cron([cron @ 07:00]) --> gha[GHA workflow]
    gha -->|gh api: open alerts| claude[claude-code-action]
    claude --> prs([one PR per fix])

It works! But…

His original design was actually event-driven. Zapier was going to catch a Jira ticket transitioning, fire a workflow_dispatch on the GHA, pass the ticket ID through, and the agent would write a changelog entry tied back to that ticket. One automation, two systems' context, real cross-tool linkage. Pretty simple stuff right?

Unfortunately, that's not what shipped since Zapier doesn't have a stable answer for secret management. So the trigger flipped from "Jira event via Zapier" to boring old cron. With cron there's no Jira context flowing through the run, the agent can't correlate its work to a ticket anymore, and the original event-driven loop became a glorified ✨agentic✨ cron job.

What shipped is one cron, one tool call, no Jira link, no event loop. The version that fit through CI's narrow door. Yay…

None of this is a critique of his engineering. He did the right thing under the constraints he had — the automation is real and useful, and it's better than no automation by a wide margin. The constraints themselves are what I want to talk about. Every team I know that's added "agents to CI" has a version of this story: ambitious design, substrate wall, smaller version out the door.

CI is the wrong substrate for agentic work, and I'll explain why.

What CI does well and does not

First of all, the goods

The thing I actually miss when running cloud agents is git.

Agent configuration, if we take a look at Cursor Cloud Automation, lives on Cursor's side. There's no API to read it, no commit log of who changed what, no PR review when someone tweaks a prompt or rewires a trigger. If a bot starts behaving differently on Tuesday, the only option is to ask around and hope someone remembers.

CI flips this. The agent's behaviour is a file in your repo. Changes go through review, git blame tells you when a regression started, and a revert is a one-line PR. GitHub Agentic Workflows leans on this hard: every agent lives in .github/, gets versioned with the rest of the codebase, and behaves like any other CI artifact.

CI was built for short, deterministic jobs. Long-running, stateful, "go think for eight minutes and open a PR" work is not the shape of a typical GitHub Actions runner. You can force it, but the runner timeout, the limited disk, and the assumption that every job is a pure function of its inputs all start pushing back.

Bugbot is the closest thing to a counter-example. It runs in a "CI-shaped" job, ships under tight time and resource budgets, and produces useful output most of the time.

Runners are ephemeral

A GitHub Actions runner is a fresh VM that vanishes when the job ends.

For a once-a-night batch job, this is fine — the workspace warm-up cost is paid once per day. Done!

The moment the agent needs a second pass (read the alert, apply the patch, run the test suite to validate the fix), you're re-cloning the repo and re-installing dependencies every run.

Fail-fast semantics nuke in-flight work

CI is built around this simple loop: "kill the runner, retry the job". Agent sessions can't be retried mindlessly. They commit code. They open PRs. They hold context that took twenty minutes to build. They cost real model tokens.

The day you explain to a developer why their 40-minute refactor evaporated because a self-hosted runner hit its 6-hour cap mid-session is the day you'll wish you'd built somewhere else 😉

There's no coherent place for cross-system orchestration

The moment the trigger comes from somewhere else — Jira, Linear, Slack, an external alert — is usually the day you reach for Zapier or n8n, or start vibe-coding your own multi-step coordination lambda that will definitely not survive your next design review 😸

The agent should hold the tools and do the coordination itself, in one session, with one set of credentials. That's what tool-calling and MCP are for. Zapier was the right shape in 2018; in 2026 you want the agent to BE the orchestrator, and the substrate's only job is to give it a place to run with the tools wired in.

What the right substrate looks like

The right substrate exists for most agentic work, and it's already productized (or will be in the very near future).

Cloud Automation products give you a managed runtime where you declare "on this trigger, start a session with this agent, with these tools, against this repo, using those credentials" — and somebody else handles the runner, the secret store, the event plumbing and all that platform stuff. Cursor's Cloud Automations is the one I'd reach for today for its accessibility and ease of use, but the category matters more than the vendor.

Anthropic published Managed Agents earlier this month with a near-identical diagnosis: decouple the brain 🧠 (the model and its harness) from the hands ✋ (the sandboxes and tools) and the session ⏳ (a durable event log), so any of them can fail or be replaced without taking the others down.

What I'd do if I were starting today

If your only agentic workflow is a simple thing that can be 1-shotted through a single prompt, GitHub Actions are fine. Don't over-engineer.

The trap is what comes next. The next thing your team asks of an agent will be longer, more stateful, more tied to events outside GitHub. Don't bolt it onto CI. Pick a Cloud Automation platform, ship it there, and treat your agent definitions as the portable artifact — the runtime layer is going to become commoditized quickly, the prompts and tools are where the long-term leverage lives.

My bet for the next few months: CI is only meant to be building binaries and running tests, and "agents in CI" will become a distant memory. The teams that move early will spend their year on the agents themselves instead of on the glue.

If you're already past the substrate question and onto what the agent should actually do, that's where the previous post picks up: giving agents the right tool for the right job.