Replacing Python subprocesses with a Go worker
- Setup
- parallel read-only tools in JACA, Terminal Bench 2.0 task containers, and a subprocess-per-call escape hatch that was correct but painfully slow
- Found
- the real bottleneck was subprocess-per-call, not thread count; both Go and Rust persistent helpers fixed it, and Go was the better repo fit while Python kept semantic ownership
- Result
- JACA now uses a long-lived Go worker after a Go-vs-Rust spike showed persistent helpers were hundreds to more than a thousand times faster than subprocess-per-call; Terminal Bench 2.0 ships a prebuilt helper into task containers, and the first broad A/B/C Terminal Bench 2.0 window after the change showed fewer timeouts and errors
Starting situation
JACA leans constantly on read, ls, find, and grep. After I hit hangs in the threaded path, I took the safest escape hatch I had: spawn a fresh Python subprocess for every read-only call.
It worked. It was also the wrong long-term shape. Cheap file inspection had started paying process startup cost every single time.
The experiment
I locked the boundary before comparing implementations.
Go and Rust both spoke the same JSON Lines protocol, used the same Python caller, and ran the same request corpus. Python kept semantic ownership throughout: tool schema, validation, result shaping, activity metadata, session meaning, and RPC meaning all stayed in the backend.
That made the comparison useful. I was testing execution engines, not rewriting the architecture twice.
What the data said
This was an end-to-end helper benchmark, not a raw language shootout.
That distinction matters. The benchmark included the Python caller, JSON Lines transport, request decoding, filesystem work, and response encoding. In that path, the big result was not “Go beat Rust.” The big result was “persistent helper beat subprocess-per-call by a mile.”
- Go was about
1300xfaster than the old subprocess path on warm reads. - Rust was about
600xfaster than the old subprocess path on warm reads. - On this measured warm-read path, Go was about
2xfaster than Rust. - Rust produced a binary about
5xsmaller. - Go built about
1.5xfaster.
That was enough to settle the hard part. The old subprocess seam was the real bottleneck.
Why Go shipped
Rust was fast enough. I did not reject it because “Rust is slow.” The spike says the opposite: Rust cleared the real bar easily and won binary size.
Go shipped because it fit the repo better. It already belonged to the build world, it had the better read latency in this spike, and it kept packaging and CI simpler. I also kept the helper separate from the Go TUI binary. That boundary mattered. Python still owns meaning; the helper just does the fast boring work.
Why Terminal Bench 2.0 mattered
A local speedup is not interesting if the benchmark harness never sees it.
So the Terminal Bench 2.0 path changed too. The helper is built once and shipped into task containers as a prebuilt binary. That is what made the fast path real outside local dev.
What changed after
I cannot claim one commit caused every later Terminal Bench 2.0 gain. The same window included other runtime work.
But the first broad GLM-5 high A/B/C window after this change still moved in the right direction at once:
- weighted mean up about
16% - total errors down about
23% AgentTimeoutErrordown about18%
That is not proof of a monocausal story. It is enough to say the worker landed inside a real improvement wave, not a private microbenchmark fantasy.
Takeaway
The interesting decision was not Go over Rust. It was persistent helper over subprocess-per-call.
Both prototypes proved the same thing. Once that was obvious, Go became the practical choice, Python stayed the semantic owner, and one hot path stopped taxing the whole agent.