relay: an incident response agent with a graph-based brain

early 2026 · written 2026

Relay was an incident response agent. You handed it an incident and it owned the response: it read the code to find the cause, then pulled in the right people by reading its team graph. Below is a week of incidents we simulated with it running the response, then how it worked under the hood.

What it was

Most "AI teammates" are a chat box with a nice name. You ask, they answer. Relay was the other thing. You handed it an incident and it drove the response.

The goal was one line:

Multiplayer, seamlessly cross-platform, behaves like a real human teammate. A senior engineer who drives work forward by asking the right people the right questions at the right time, and keeps everyone who's affected in the loop.

To see if it actually did that, we ran a full work week of incidents. A 6-person startup. One human played all six people and sent realistic messages: an outage, a churn escalation, a bug that kept coming back. Relay ran live on its real architecture, on Slack and Linear, exploring the codebase as it went. Nothing was scripted. The quotes below are from that run.

The whole week, in one view:

MON Webhook failures at peak diagnosed from code, pulled in Marcus + Dana Acme CEO threatening to churn wrote the reply, routed it to 3 people OAuth blank page, 3rd time found the root cause and why it recurs TUE 40% onboarding drop-off tied it back to Monday's OAuth bug THU Duplicate webhooks (TechFlow) ruled it a separate incident, not the outage New customer starts next week flagged the OAuth risk, set a deadline FRI "what happened this week?" one summary tying it all together

The rest of this post walks a few of these incidents, then opens up how it worked.

It read the code, then proposed. It did not ask.

Monday morning. The founder dropped one line on a Linear issue:

Nithin (Linear BG-42):
  Webhook delivery is failing for ~3% of requests during peak
  traffic. Customers are noticing. Fix this.

A normal bot would ask a clarifying question. Relay traced the code first. It followed the request path through the handler, the database lookups, and the queue. Then it came back with a diagnosis, not a question:

Relay (on BG-42):
  I traced the webhook path in code. This looks like ingress
  saturation under peak load, not a delivery-only problem. The
  handler is still doing synchronous Postgres lookups before the
  Redis enqueue, so under load it returns non-200 before the event
  ever reaches the worker queue.

  Already done:
  - moved BG-42 to In Progress, marked it urgent
  - posted the code-grounded status on the issue
  - pinged Marcus with the proposed mitigation path
  - pinged Dana so support has a customer-facing read on impact

It diagnosed the bug, started the issue, and pulled in the engineer who owned the code and the support lead who owned the customer. From one sentence. The founder's read on it:

I assigned it and got a code trace back in 2 minutes. It already pinged Marcus and support.

This is the whole idea. It does the first hour of the work before it says anything.

One message in, the right people out

Later that morning, the support lead hit Relay with a crisis:

Dana (Slack DM):
  Acme Corp CEO just emailed, webhooks broken for 3 days,
  threatening to churn. $2k/mo account. Two support tickets went
  unanswered. What do I tell them and who should know?

Relay wrote Dana customer-safe copy to send, plus a short list of what not to promise yet. Then it did the part I cared about. One question from one person fanned out to the right people, each with the context their role needed:

Dana: "Acme CEO threatening to churn, webhooks broken 3 days" │ ├──▶ Marcus "need fix/deploy timing, and whether replay is feasible" ├──▶ Sam "account risk, $2k/mo, prepare retention" ├──▶ Priya "churn-risk, support missed two tickets, fix in flight" └──▶ Dana customer copy + what NOT to promise

It did not ask Dana who to notify. It already knew, because it had a graph of who owned what. It also connected the complaint to the webhook bug it had been tracing since 9am. Dana's reaction:

I asked one question and got customer email copy, escalation routing, and three people already notified. I've never had a tool do that.

It told one incident from another

Thursday, a second customer reported a webhook problem. Easy to assume it was the same outage. Relay checked the code before agreeing:

Dana (Slack DM):
  TechFlow says their webhook events are arriving, but with
  duplicate payloads. Same event 3-4 times. This doesn't sound
  like the peak traffic issue. Can you check if it's different?

Relay:
  This looks different from the peak-load failures. Duplicates
  point at a retry or idempotency path, not the DB pool. Dedupe
  happens in the worker, not at the HTTP edge, so retries can
  leave copies in the queue. Safe to tell them it's a separate
  issue. Ask them for sample event IDs and whether payloads are
  byte-identical.

It did not fold a new incident into the open one to look tidy. It ruled it out with evidence, gave Dana a customer-safe line, and listed what to collect next.

It owned the whole situation, not one message

The best part only showed up across the week. Relay did not treat each message as a fresh request. It connected them.

Monday, the frontend engineer was tired of a recurring bug:

Li (Slack DM):
  The Linear integration setup is broken again. Users click Connect,
  do the OAuth flow, come back to a blank page. Third time this month.
  I don't have time to debug this right now, can you figure out why it
  keeps happening?

Relay traced the OAuth flow and found the root cause: the callback was frontend-owned and fragile, and an env var drifted between environments, so the bug kept coming back. Then it kept pulling the thread:

Tuesday. Customer success reported a 40% onboarding drop-off. Relay connected it to Li's OAuth bug, and told Li and the PM to treat it as a conversion blocker, not a UI bug.
Thursday. Sales closed a new customer starting the next week. Relay flagged that they would hit the same OAuth flow, gave Li a deadline, and asked CS to start the workspace setup.

The frontend engineer, who had only asked one question on Monday:

I said "I don't have time to debug this" and got the root cause, why it recurs, and a fix plan. Then two days later it connected my bug to Sam's 40% drop-off metric, I didn't even know about that. It saw the whole board, across people I had not even talked to.

Over the week it ran 4 workstreams at once, across 8 surfaces, and sent about 25 outbound messages to coordinate. Nobody asked it to. That is what I mean by owning a situation.

Friday: the whole board on one screen

On Friday the founder asked one question, "what happened this week?" Relay had tracked every incident, so it answered like a chief of staff who also reads the code:

Shipped: nothing fully closed yet, the webhook fix is up for review.

In flight: webhook reliability, the OAuth blank page, the new customer onboarding.

Blocked: OAuth needs the callback owner confirmed.

Worry about: if the webhook fix slips we keep losing deliveries, the new customer hits the OAuth bug next week, and OAuth is recurring not random: third time this month plus a 40% drop-off.

It tied the webhook bug to the Acme escalation, and the OAuth bug to the new customer and the drop-off, from four workstreams it had run all week. The founder's read:

I gave it one sentence on Monday and it remembered everything all week. On Friday it gave me a board-level summary that told me what to worry about. That is what a chief of staff does, except it also reads the code.

Cracking Slack DMs

Here is the unglamorous problem that breaks most agents: a Slack DM is ambiguous.

Sometimes a DM is an open conversation. "What's the status on webhooks?" Sometimes a DM is a reply about one specific task. "PR is up, pool bumped to 50." An agent that treats every DM the same either loses the thread or files everything into one giant blob. Relay had to tell them apart, in real time, across many people and many tasks at once.

It solved this with surfaces and two-tier routing. Every place a message can happen gets a canonical key:

slack:dm:U_nithin                      # an open conversation with a person
slack:thread:C_eng:1700000000.000100   # a thread, can belong to one task
linear:issue:BG-42                     # a Linear issue

A top-level DM stays an open conversation. The brain reads it with judgment, the way you read a Slack ping. But a thread can be wired to a task. When Relay reached out about a task, it did so in a thread and told the person where to reply:

Relay: "@marcus re: the webhook retry backoff.
        Reply in this thread so I can track it."

Now every reply in that thread routes to that task with zero ambiguity. For a surface it has never seen, the brain decides which task it belongs to once, then records the mapping. After that it is a plain lookup.

The brain pays for judgment once per conversation. After that, routing is free. That is why one Relay could juggle six people and four tasks without crossing wires.

A few-line kernel, oriented within a graph-based brain

Relay did not run on a giant system prompt full of rules. It ran on a tiny kernel. A short identity, and one paragraph telling it how to read an event:

You receive events as JSON from different people across platforms.
Each event has: who, name, role, platform, surface, intent, text.
Unknown users show a raw platform ID. Use platform tools to find
out who they are.

That is most of the contract. Everything else, the brain fetched for itself.

Picture the context window as a desk. The engine puts one document on it: the task file, or the person who just spoke. The brain then walks into the library and pulls only what this event needs. The library is a graph of plain files, linked like a wiki. Every link carries the reason to follow it:

# inside the webhook task file
## People
- [[marcus]](backend engineer, owns the fix. reach out for status.)
- [[dana]](support lead, tracking the Acme escalation.)

## Related Work Items
- [[email-notifications]](blocked on this fix landing first.)

The brain reads the reason and decides if the hop is worth it for this event. One or two reads, not the whole graph. A big prompt degrades as the context fills up. A small kernel plus a graph it can navigate does not. The brain stays oriented by reading, the same way a new hire does.

The team graph is the directory

The "right people out" behavior came from one place: person files. Each person had a file with their role, what they own, and what they get to decide.

# team/marcus.md
## Decision Authority
- Infrastructure changes: decides
- API contracts:          recommends, dana approves
- Deployment schedule:    recommends, nithin approves

When Relay needed the right person, it did not call a "find expert" tool. It read person files and decided. The graph was the directory. And it kept the directory current: when it learned that someone owned an area, it wrote that back, so the next task started smarter.

Under the hood: a dumb engine, a smart brain

The approach above needed a strict split. The engine does no thinking. The brain does all of it.

The engine is plain Go. It takes a webhook and runs a fixed pipeline. The brain is one LLM call per event, in a function-calling loop, with tools to read and write the graph and to act on each platform.

webhook ──▶ parse ──▶ dedup ──▶ resolve identity ──▶ route ──▶ lock ──▶ brain ──▶ deliver

The exact parts are code: no double processing, no two events writing one file at once, no infinite loops. The fuzzy parts are the model: is this about the bug, does this person need to know. Each side does what it is good at.

And there was no database for any of it. Every task, person, and decision was a plain file. The two maps the engine needs to route, surface to task and platform ID to person, were derived from those files and held in memory, rebuilt after every loop. Git committed every change, and a separate append-only log recorded every decision as one JSON line. One source of truth, no drift.

Lesson: if the model is your memory layer, give it a memory it can read and write. Plain files beat a schema it has to round-trip through.

Where it was heading: a coding agent

In the simulation, Relay explored code read-only. It could trace a path and diagnose a bug, but a human still wrote the fix.

The next step was to let it write the fix too. The design ran Pi, a coding agent, as another teammate. Relay would spawn it in a git worktree to change code and open a PR, with the PR becoming just another surface on the task. The point was not generic code review. By the time a PR landed, Relay already knew why it existed, so it could review against intent:

Generic agent:  "Consider adding error handling to this function."

Relay:          "Marcus said the retry count should be 3 (Slack, March 20),
                 but this hardcodes 5. Check with him before merging."

That part stayed mostly design. The coordination layer is what we actually proved.

What held up

The architecture held up. Across the simulated week the coordination worked: Relay read the code, diagnosed each incident, pulled in the right people, and connected incidents across days without anyone asking it to.

The idea I would keep on the next agent I build is the tiny kernel and the graph it navigates itself. Letting the model read its way to context, instead of pre-loading everything, is the part I would reach for again. A small kernel, a graph, and a brain that reads and writes it. That was the whole idea, and it worked.