Writing/Article
·8 min read·ai

Using Hatchet MCP to Let AI Agents Operate Background Workflows

A practical guide to using hatchet-mcp so Claude and other MCP clients can inspect workflow runs, read logs, check workers, and safely trigger, cancel, or replay Hatchet jobs.

TL;DR

8 min read
  • Hatchet is great at running background workflows. The missing piece was giving AI agents a clean way to see what is happening inside those workflows.
  • hatchet-mcp turns Hatchet into an MCP-accessible operations surface. Agents can list workflows, inspect runs, read task logs, check workers, and look at queue metrics.
  • The write tools are intentionally explicit. Trigger, cancel, and replay are available, but they are labeled as live-state mutations so the agent knows it is touching production.
  • Install is one npx command plus a Hatchet token. The token encodes the server URL and tenant, so the normal setup is small.

Most AI coding agents are surprisingly bad at one very normal software job: figuring out what happened after the code left the request-response path.

They can read a repository. They can run tests. They can call an API. But the moment the work moves into a queue, a cron, a worker, or a long-running workflow, the agent is usually blind unless you manually paste logs back into the chat.

That is tolerable for toy apps. It breaks down fast in real systems.

If a scrape fails, a data sync stalls, or a customer import gets stuck halfway through, the question is not "can the agent write TypeScript?" The question is whether it can inspect the actual operational surface:

  • Which workflow ran?
  • Which task failed?
  • What did the worker log?
  • Is the queue backed up?
  • Did this fail once, or is it failing every time?
  • Should we replay, cancel, or trigger a fresh run?

That is why I built hatchet-mcp.

The Short Version

Hatchet is a workflow orchestration system. You use it for background jobs, durable workflows, retries, scheduled work, and the unglamorous machinery that makes production software keep moving.

MCP is the Model Context Protocol. It gives AI agents a standard way to use external tools and data sources.

hatchet-mcp connects the two.

It gives Claude Code, Claude Desktop, and other MCP-compatible clients a set of tools for observing and operating Hatchet:

ToolWhat it lets the agent do
whoamiConfirm the token works and show the resolved tenant and server URL.
list_workflowsSee the workflow definitions available in the Hatchet tenant.
list_runsFind recent workflow runs, with optional filters and lookback windows.
get_runInspect a specific run, including status, tasks, and errors.
get_run_logsPull logs for a task by external id.
list_workersCheck worker status.
get_queue_metricsInspect queue and task health.
trigger_workflowStart a workflow run with a JSON input payload.
cancel_runsCancel one or more runs or tasks.
replay_runsReplay one or more runs or tasks.

The goal is simple: an agent should be able to debug a background workflow with the same level of direct evidence a human operator would use.

Installation

Add this to your MCP config:

{
  "mcpServers": {
    "hatchet": {
      "command": "npx",
      "args": ["-y", "hatchet-mcp"],
      "env": {
        "HATCHET_CLIENT_TOKEN": "<your-hatchet-api-token>"
      }
    }
  }
}

You get the token from the Hatchet dashboard under API tokens.

In the normal Hatchet Cloud flow, that one token is enough. It is a JWT that encodes the server URL and tenant, so hatchet-mcp can resolve where to send requests without making you copy a pile of extra settings.

If you self-host Hatchet, you can also set:

VariableRequiredUse it when
HATCHET_CLIENT_TOKENYesYou want the MCP server to authenticate with Hatchet.
HATCHET_API_BASENoYou need to point at a self-hosted Hatchet instance.
HATCHET_TENANT_IDNoYou need to override the tenant decoded from the token.

The First Check: Can the Agent See Hatchet?

After adding the MCP server, the first useful command is not "replay this failed job."

It is:

Use the Hatchet MCP server to run whoami and confirm which tenant and API base you can access.

This is boring in the right way. It verifies the token, confirms the agent is talking to the Hatchet instance you expected, and catches config mistakes before you ask the agent to touch live jobs.

Then I usually ask for a quick operational read:

List recent Hatchet workflow runs from the last 24 hours and summarize any failures by workflow name.

That gives the agent a starting map. From there, it can drill into the run that matters.

A Real Debugging Flow

Here is the workflow I had in mind when building it.

Something in production looks wrong. Maybe a customer import has not completed. Maybe a scrape should have populated new rows and did not. Maybe a scheduled workflow started failing after a deploy.

Instead of manually opening the Hatchet dashboard, finding the run, expanding the failed task, copying logs, and pasting them into an agent, you can ask:

Find failed workflow runs related to sync_customer_accounts in the last 12 hours. Inspect the most recent failed run, read the task logs, and tell me the likely root cause.

A useful agent can now follow the evidence path:

  1. Call list_runs with a lookback window.
  2. Find the failed run.
  3. Call get_run to inspect the task breakdown.
  4. Call get_run_logs for the failed task.
  5. Explain whether the issue looks like bad input, a dependency failure, a code regression, or an infrastructure problem.

That last step matters. The MCP server does not magically make the agent wise. It gives the agent access to the operational facts it needs to be useful.

The Safety Model

I split the tools into two groups.

The read tools are non-destructive:

  • whoami
  • list_workflows
  • list_runs
  • get_run
  • get_run_logs
  • list_workers
  • get_queue_metrics

These are the tools I want agents to use freely when investigating.

The write tools mutate live state:

  • trigger_workflow
  • cancel_runs
  • replay_runs

Those tool descriptions are deliberately prefixed with MUTATES LIVE STATE.

That wording is not decorative. It is there because AI agents need sharp boundaries around operations that affect production. Reading logs should feel cheap. Replaying a job should feel like an action that needs intent.

My preferred pattern is:

  1. Let the agent investigate with read tools.
  2. Ask it to propose the smallest safe action.
  3. Review the target run ids or workflow name.
  4. Then let it trigger, cancel, or replay.

That keeps the agent useful without turning it into an unsupervised production operator.

What This Is Good For

hatchet-mcp is most useful when the workflow system is already part of your product's nervous system.

Good use cases:

  • Debugging failed background jobs.
  • Checking whether workers are healthy.
  • Inspecting queue pressure before and after a deploy.
  • Finding the run that created a bad downstream state.
  • Replaying failed tasks after a dependency recovers.
  • Triggering internal workflows from an agent-assisted maintenance flow.

It is especially helpful for teams that already use AI coding agents heavily. Once the agent can see the workflow layer, it can connect code, runtime behavior, and logs in one investigation.

What I Would Not Use It For

I would not use it as a replacement for monitoring, alerting, or a proper incident process.

You still want dashboards. You still want alerts. You still want humans making judgment calls around production data and customer impact.

The point of MCP is not to replace those systems. It is to make the agent a better participant in them.

Without Hatchet MCP, the agent is waiting for you to spoon-feed it the facts.

With Hatchet MCP, it can go look.

Try It

The repo is here: github.com/ElliotPadfield/hatchet-mcp

It is also published to npm as hatchet-mcp, so you can run it directly with npx.

If you already use Hatchet and an MCP-compatible agent, the fastest test is simple:

  1. Add the MCP config.
  2. Run whoami.
  3. Ask for failed runs from the last 24 hours.
  4. Pick one run and ask the agent to explain what happened from the task details and logs.

That is the moment the value becomes obvious. The agent stops being a code autocomplete box and starts behaving more like an operator with access to the system it is supposed to help with.

aideveloper-tools