002 2026-03-29 concluded just-another-coding-agent

Replacing Python subprocesses with a Go worker

agentsgorustperformancetoolsterminal-bench

Setup: parallel read-only tools in JACA, Terminal Bench 2.0 task containers, and a subprocess-per-call escape hatch that was correct but painfully slow
Found: the real bottleneck was subprocess-per-call, not thread count; both Go and Rust persistent helpers fixed it, and Go was the better repo fit while Python kept semantic ownership
Result: JACA now uses a long-lived Go worker after a Go-vs-Rust spike showed persistent helpers were hundreds to more than a thousand times faster than subprocess-per-call; Terminal Bench 2.0 ships a prebuilt helper into task containers, and the first broad A/B/C Terminal Bench 2.0 window after the change showed fewer timeouts and errors

Starting situation

JACA leans constantly on read, ls, find, and grep. After I hit hangs in the threaded path, I took the safest escape hatch I had: spawn a fresh Python subprocess for every read-only call.

It worked. It was also the wrong long-term shape. Cheap file inspection had started paying process startup cost every single time.

The experiment

I locked the boundary before comparing implementations.

Go and Rust both spoke the same JSON Lines protocol, used the same Python caller, and ran the same request corpus. Python kept semantic ownership throughout: tool schema, validation, result shaping, activity metadata, session meaning, and RPC meaning all stayed in the backend.

That made the comparison useful. I was testing execution engines, not rewriting the architecture twice.

What the data said

This was an end-to-end helper benchmark, not a raw language shootout.

That distinction matters. The benchmark included the Python caller, JSON Lines transport, request decoding, filesystem work, and response encoding. In that path, the big result was not “Go beat Rust.” The big result was “persistent helper beat subprocess-per-call by a mile.”

Go was about 1300x faster than the old subprocess path on warm reads.
Rust was about 600x faster than the old subprocess path on warm reads.
On this measured warm-read path, Go was about 2x faster than Rust.
Rust produced a binary about 5x smaller.
Go built about 1.5x faster.

Warm read speedup versus the old Python subprocess path. Go: about 1300x faster. Rust: about 600x faster.

That was enough to settle the hard part. The old subprocess seam was the real bottleneck.

Why Go shipped

Rust was fast enough. I did not reject it because “Rust is slow.” The spike says the opposite: Rust cleared the real bar easily and won binary size.

Go shipped because it fit the repo better. It already belonged to the build world, it had the better read latency in this spike, and it kept packaging and CI simpler. I also kept the helper separate from the Go TUI binary. That boundary mattered. Python still owns meaning; the helper just does the fast boring work.

Why Terminal Bench 2.0 mattered

A local speedup is not interesting if the benchmark harness never sees it.

So the Terminal Bench 2.0 path changed too. The helper is built once and shipped into task containers as a prebuilt binary. That is what made the fast path real outside local dev.

What changed after

I cannot claim one commit caused every later Terminal Bench 2.0 gain. The same window included other runtime work.

But the first broad GLM-5 high A/B/C window after this change still moved in the right direction at once:

weighted mean up about 16%
total errors down about 23%
AgentTimeoutError down about 18%

That is not proof of a monocausal story. It is enough to say the worker landed inside a real improvement wave, not a private microbenchmark fantasy.

Takeaway

The interesting decision was not Go over Rust. It was persistent helper over subprocess-per-call.

Both prototypes proved the same thing. Once that was obvious, Go became the practical choice, Python stayed the semantic owner, and one hot path stopped taxing the whole agent.