relay: an incident response agent with a graph-based brain
Relay was an incident response agent. You handed it an incident and it owned the response: it read the code to find the cause, then pulled in the right people by reading its team graph. Below is a week of incidents we simulated with it running the response, then how it worked under the hood.
What it was
Most "AI teammates" are a chat box with a nice name. You ask, they answer. Relay was the other thing. You handed it an incident and it drove the response.
The goal was one line:
Multiplayer, seamlessly cross-platform, behaves like a real human teammate. A senior engineer who drives work forward by asking the right people the right questions at the right time, and keeps everyone who's affected in the loop.
To see if it actually did that, we ran a full work week of incidents. A 6-person startup. One human played all six people and sent realistic messages: an outage, a churn escalation, a bug that kept coming back. Relay ran live on its real architecture, on Slack and Linear, exploring the codebase as it went. Nothing was scripted. The quotes below are from that run.
The whole week, in one view:
The rest of this post walks a few of these incidents, then opens up how it worked.
It read the code, then proposed. It did not ask.
Monday morning. The founder dropped one line on a Linear issue:
Nithin (Linear BG-42):
Webhook delivery is failing for ~3% of requests during peak
traffic. Customers are noticing. Fix this.
A normal bot would ask a clarifying question. Relay traced the code first. It followed the request path through the handler, the database lookups, and the queue. Then it came back with a diagnosis, not a question:
Relay (on BG-42):
I traced the webhook path in code. This looks like ingress
saturation under peak load, not a delivery-only problem. The
handler is still doing synchronous Postgres lookups before the
Redis enqueue, so under load it returns non-200 before the event
ever reaches the worker queue.
Already done:
- moved BG-42 to In Progress, marked it urgent
- posted the code-grounded status on the issue
- pinged Marcus with the proposed mitigation path
- pinged Dana so support has a customer-facing read on impact
It diagnosed the bug, started the issue, and pulled in the engineer who owned the code and the support lead who owned the customer. From one sentence. The founder's read on it:
I assigned it and got a code trace back in 2 minutes. It already pinged Marcus and support.
This is the whole idea. It does the first hour of the work before it says anything.
One message in, the right people out
Later that morning, the support lead hit Relay with a crisis:
Dana (Slack DM):
Acme Corp CEO just emailed, webhooks broken for 3 days,
threatening to churn. $2k/mo account. Two support tickets went
unanswered. What do I tell them and who should know?
Relay wrote Dana customer-safe copy to send, plus a short list of what not to promise yet. Then it did the part I cared about. One question from one person fanned out to the right people, each with the context their role needed:
It did not ask Dana who to notify. It already knew, because it had a graph of who owned what. It also connected the complaint to the webhook bug it had been tracing since 9am. Dana's reaction:
I asked one question and got customer email copy, escalation routing, and three people already notified. I've never had a tool do that.
It told one incident from another
Thursday, a second customer reported a webhook problem. Easy to assume it was the same outage. Relay checked the code before agreeing:
Dana (Slack DM):
TechFlow says their webhook events are arriving, but with
duplicate payloads. Same event 3-4 times. This doesn't sound
like the peak traffic issue. Can you check if it's different?
Relay:
This looks different from the peak-load failures. Duplicates
point at a retry or idempotency path, not the DB pool. Dedupe
happens in the worker, not at the HTTP edge, so retries can
leave copies in the queue. Safe to tell them it's a separate
issue. Ask them for sample event IDs and whether payloads are
byte-identical.
It did not fold a new incident into the open one to look tidy. It ruled it out with evidence, gave Dana a customer-safe line, and listed what to collect next.
It owned the whole situation, not one message
The best part only showed up across the week. Relay did not treat each message as a fresh request. It connected them.
Monday, the frontend engineer was tired of a recurring bug:
Li (Slack DM):
The Linear integration setup is broken again. Users click Connect,
do the OAuth flow, come back to a blank page. Third time this month.
I don't have time to debug this right now, can you figure out why it
keeps happening?
Relay traced the OAuth flow and found the root cause: the callback was frontend-owned and fragile, and an env var drifted between environments, so the bug kept coming back. Then it kept pulling the thread:
- Tuesday. Customer success reported a 40% onboarding drop-off. Relay connected it to Li's OAuth bug, and told Li and the PM to treat it as a conversion blocker, not a UI bug.
- Thursday. Sales closed a new customer starting the next week. Relay flagged that they would hit the same OAuth flow, gave Li a deadline, and asked CS to start the workspace setup.
The frontend engineer, who had only asked one question on Monday:
I said "I don't have time to debug this" and got the root cause, why it recurs, and a fix plan. Then two days later it connected my bug to Sam's 40% drop-off metric, I didn't even know about that. It saw the whole board, across people I had not even talked to.
Over the week it ran 4 workstreams at once, across 8 surfaces, and sent about 25 outbound messages to coordinate. Nobody asked it to. That is what I mean by owning a situation.
Friday: the whole board on one screen
On Friday the founder asked one question, "what happened this week?" Relay had tracked every incident, so it answered like a chief of staff who also reads the code:
Shipped: nothing fully closed yet, the webhook fix is up for review.
In flight: webhook reliability, the OAuth blank page, the new customer onboarding.
Blocked: OAuth needs the callback owner confirmed.
Worry about: if the webhook fix slips we keep losing deliveries, the new customer hits the OAuth bug next week, and OAuth is recurring not random: third time this month plus a 40% drop-off.
It tied the webhook bug to the Acme escalation, and the OAuth bug to the new customer and the drop-off, from four workstreams it had run all week. The founder's read:
I gave it one sentence on Monday and it remembered everything all week. On Friday it gave me a board-level summary that told me what to worry about. That is what a chief of staff does, except it also reads the code.
Cracking Slack DMs
Here is the unglamorous problem that breaks most agents: a Slack DM is ambiguous.
Sometimes a DM is an open conversation. "What's the status on webhooks?" Sometimes a DM is a reply about one specific task. "PR is up, pool bumped to 50." An agent that treats every DM the same either loses the thread or files everything into one giant blob. Relay had to tell them apart, in real time, across many people and many tasks at once.
It solved this with surfaces and two-tier routing. Every place a message can happen gets a canonical key:
slack:dm:U_nithin # an open conversation with a person
slack:thread:C_eng:1700000000.000100 # a thread, can belong to one task
linear:issue:BG-42 # a Linear issue
A top-level DM stays an open conversation. The brain reads it with judgment, the way you read a Slack ping. But a thread can be wired to a task. When Relay reached out about a task, it did so in a thread and told the person where to reply:
Relay: "@marcus re: the webhook retry backoff.
Reply in this thread so I can track it."
Now every reply in that thread routes to that task with zero ambiguity. For a surface it has never seen, the brain decides which task it belongs to once, then records the mapping. After that it is a plain lookup.
The brain pays for judgment once per conversation. After that, routing is free. That is why one Relay could juggle six people and four tasks without crossing wires.
A few-line kernel, oriented within a graph-based brain
Relay did not run on a giant system prompt full of rules. It ran on a tiny kernel. A short identity, and one paragraph telling it how to read an event:
You receive events as JSON from different people across platforms.
Each event has: who, name, role, platform, surface, intent, text.
Unknown users show a raw platform ID. Use platform tools to find
out who they are.
That is most of the contract. Everything else, the brain fetched for itself.
Picture the context window as a desk. The engine puts one document on it: the task file, or the person who just spoke. The brain then walks into the library and pulls only what this event needs. The library is a graph of plain files, linked like a wiki. Every link carries the reason to follow it:
# inside the webhook task file
## People
- [[marcus]](backend engineer, owns the fix. reach out for status.)
- [[dana]](support lead, tracking the Acme escalation.)
## Related Work Items
- [[email-notifications]](blocked on this fix landing first.)
The brain reads the reason and decides if the hop is worth it for this event. One or two reads, not the whole graph. A big prompt degrades as the context fills up. A small kernel plus a graph it can navigate does not. The brain stays oriented by reading, the same way a new hire does.
The team graph is the directory
The "right people out" behavior came from one place: person files. Each person had a file with their role, what they own, and what they get to decide.
# team/marcus.md
## Decision Authority
- Infrastructure changes: decides
- API contracts: recommends, dana approves
- Deployment schedule: recommends, nithin approves
When Relay needed the right person, it did not call a "find expert" tool. It read person files and decided. The graph was the directory. And it kept the directory current: when it learned that someone owned an area, it wrote that back, so the next task started smarter.
Under the hood: a dumb engine, a smart brain
The approach above needed a strict split. The engine does no thinking. The brain does all of it.
The engine is plain Go. It takes a webhook and runs a fixed pipeline. The brain is one LLM call per event, in a function-calling loop, with tools to read and write the graph and to act on each platform.
The exact parts are code: no double processing, no two events writing one file at once, no infinite loops. The fuzzy parts are the model: is this about the bug, does this person need to know. Each side does what it is good at.
And there was no database for any of it. Every task, person, and decision was a plain file. The two maps the engine needs to route, surface to task and platform ID to person, were derived from those files and held in memory, rebuilt after every loop. Git committed every change, and a separate append-only log recorded every decision as one JSON line. One source of truth, no drift.
Lesson: if the model is your memory layer, give it a memory it can read and write. Plain files beat a schema it has to round-trip through.
Where it was heading: a coding agent
In the simulation, Relay explored code read-only. It could trace a path and diagnose a bug, but a human still wrote the fix.
The next step was to let it write the fix too. The design ran Pi, a coding agent, as another teammate. Relay would spawn it in a git worktree to change code and open a PR, with the PR becoming just another surface on the task. The point was not generic code review. By the time a PR landed, Relay already knew why it existed, so it could review against intent:
Generic agent: "Consider adding error handling to this function."
Relay: "Marcus said the retry count should be 3 (Slack, March 20),
but this hardcodes 5. Check with him before merging."
That part stayed mostly design. The coordination layer is what we actually proved.
What held up
The architecture held up. Across the simulated week the coordination worked: Relay read the code, diagnosed each incident, pulled in the right people, and connected incidents across days without anyone asking it to.
The idea I would keep on the next agent I build is the tiny kernel and the graph it navigates itself. Letting the model read its way to context, instead of pre-loading everything, is the part I would reach for again. A small kernel, a graph, and a brain that reads and writes it. That was the whole idea, and it worked.