005 2026-04-11 concluded just-another-coding-agent

Fixing a Windows deadlock in a terminal coding agent

agentswindowsasynciorpcdebugging

Setup: Windows JACA session, large `git show`, Python backend streaming updates to a Go terminal UI
Found: the session was not stuck because `git` was slow; it was stuck because the backend was pushing too many large updates through a small Windows pipe
Result: the fix that stayed was sending fewer partial updates and separating UI writes from the main runtime loop

The situation

I hit this in a real Windows session while asking the agent to review recent code changes. It launched a large git show, printed a little output, and then just sat there.

The odd part was the split state. The UI still showed the tool as running, but pressing Esc said there was no active run left to interrupt. That meant the backend and the UI had already drifted apart.

What I tried first

The obvious guesses came first. Maybe git itself was hanging. Maybe the shell update path was too slow. Maybe an async helper deeper in the stack was unwinding badly.

Those paths did expose a few real bugs, but the clue that mattered was simpler: Windows yes, WSL no, partial output visible, then silence. That pushed the investigation away from git and toward the pipe between the backend and the TUI.

What was really happening

The backend and the terminal UI talk through a pipe. On Windows that pipe is small.

While the shell command was producing a big diff, the backend kept sending growing “still running” updates to the UI. Eventually the pipe filled. The backend was writing those updates from the same async loop that ran the rest of the session, so once the write blocked, the loop stopped moving too.

That is why the failure looked so strange. The problem was not the shell command itself. The problem was that output delivery had been placed on the wrong boundary.

The fix that stayed

Two changes fixed the real issue.

First, the backend now sends fewer partial updates while a noisy shell command is running. Second, UI writes are handled separately from the main runtime loop, so a slow reader can slow the writer without freezing the agent.

I kept a few smaller Windows and cleanup fixes too, but those were supporting work. The durable decision was the boundary: the agent should keep doing work even when the UI side falls behind.

Takeaway

What looked like an agent bug was mostly a transport bug.

In systems like this, output handling is not decoration. Put it on the wrong boundary and it becomes part of your failure model.