Sahil Dahiya — Sahil Dahiya

sahildahiya.me / home

I'm Sahil Dahiya. I build AI evaluation systems and agent runtimes. Most recently at Workleap, previously five years at Microsoft.

Most of what I've shipped is on the eval side — frameworks, A/B tests on model choices, traces that survive a postmortem. I run two projects of my own to keep the work close to the runtime.

Background

At Workleap I worked on AI assistants and the systems that made them measurable: routing, RAG pipelines, eval frameworks, A/B testing on model choices, and conversation anonymization for product analysis.

Before that, five years at Microsoft on anomaly detection, forecasting, experimentation, and device health systems across Windows and data center ops.

Two projects of my own

Tapasya

A retrieval-augmented reading and writing environment for philosophy. Search across Nietzsche passages, inspect cited sources, continue at passage level, and move into essay mode without leaving the same thinking workflow.

Right now: search across philosophical texts, passage-level conversation, and an essay mode that keeps citations attached to every paragraph.

FastAPI / HTMX / Claude API / Voyage AI

project page live

just-another-coding-agent

A terminal coding agent built around a Python runtime, JSON-over-stdio RPC, and a first-party Go interface. The project is focused on keeping the backend contract explicit, the TUI thin, and the runtime strict enough to support real coding sessions without fallback-heavy behavior. It now also has a public Terminal-Bench 2 submission with a GLM-5 run that validated successfully at 47.4% accuracy.

Python runtime, Go TUI, JSON-over-stdio between them. First public Terminal-Bench 2 submission validated at 47.4% with GLM-5.

Python / PydanticAI / Go / Bubble Tea

jaca write-up

Logs

Short notes I keep so model choices, eval runs, and deployment decisions stay findable later.

Tapasya is a recommendation system built on top of RAG
the core problem is not one-shot answer generation; it is recommending the next passage worth reading, with retrieval and synthesis used to keep that recommendation grounded

2026-04-11
Fixing a Windows deadlock in a terminal coding agent
the session was not stuck because `git` was slow; it was stuck because the backend was pushing too many large updates through a small Windows pipe

2026-04-11
Why agents need memory and runtime framing
it is not; an agent needs both conversation memory and a separate runtime-framing baseline for the next run

2026-04-04
What compaction should preserve
between runs, compaction decides what past survives, but a long-running coding agent also needs a separate runtime-framing baseline so the next run starts under the right conditions

2026-04-02

all logs →

Elsewhere

GitHub LinkedIn

Want to talk about AI evaluation, agent runtimes, or anything above? Email's on my LinkedIn.

sahildahiya.me / home

I'm Sahil Dahiya. I build AI evaluation systems and agent runtimes. Most recently at Workleap, previously five years at Microsoft.

Most of what I've shipped is on the eval side — frameworks, A/B tests on model choices, traces that survive a postmortem. I run two projects of my own to keep the work close to the runtime.

Background

Before that, five years at Microsoft on anomaly detection, forecasting, experimentation, and device health systems across Windows and data center ops.

Two projects of my own

Tapasya

Right now: search across philosophical texts, passage-level conversation, and an essay mode that keeps citations attached to every paragraph.

FastAPI / HTMX / Claude API / Voyage AI

project page live

just-another-coding-agent

Python runtime, Go TUI, JSON-over-stdio between them. First public Terminal-Bench 2 submission validated at 47.4% with GLM-5.

Python / PydanticAI / Go / Bubble Tea

jaca write-up

Logs

Short notes I keep so model choices, eval runs, and deployment decisions stay findable later.

Tapasya is a recommendation system built on top of RAG
the core problem is not one-shot answer generation; it is recommending the next passage worth reading, with retrieval and synthesis used to keep that recommendation grounded

2026-04-11
Fixing a Windows deadlock in a terminal coding agent
the session was not stuck because `git` was slow; it was stuck because the backend was pushing too many large updates through a small Windows pipe

2026-04-11
Why agents need memory and runtime framing
it is not; an agent needs both conversation memory and a separate runtime-framing baseline for the next run

2026-04-04
What compaction should preserve
between runs, compaction decides what past survives, but a long-running coding agent also needs a separate runtime-framing baseline so the next run starts under the right conditions

2026-04-02

all logs →

Elsewhere

GitHub LinkedIn

Want to talk about AI evaluation, agent runtimes, or anything above? Email's on my LinkedIn.

sahildahiya.me / home

I'm Sahil Dahiya. I build AI evaluation systems and agent runtimes. Most recently at Workleap, previously five years at Microsoft.

Most of what I've shipped is on the eval side — frameworks, A/B tests on model choices, traces that survive a postmortem. I run two projects of my own to keep the work close to the runtime.

Background

Before that, five years at Microsoft on anomaly detection, forecasting, experimentation, and device health systems across Windows and data center ops.

Two projects of my own

Tapasya

Right now: search across philosophical texts, passage-level conversation, and an essay mode that keeps citations attached to every paragraph.

FastAPI / HTMX / Claude API / Voyage AI

project page live

just-another-coding-agent

Python runtime, Go TUI, JSON-over-stdio between them. First public Terminal-Bench 2 submission validated at 47.4% with GLM-5.

Python / PydanticAI / Go / Bubble Tea

jaca write-up

Logs

Short notes I keep so model choices, eval runs, and deployment decisions stay findable later.

Tapasya is a recommendation system built on top of RAG
the core problem is not one-shot answer generation; it is recommending the next passage worth reading, with retrieval and synthesis used to keep that recommendation grounded

2026-04-11
Fixing a Windows deadlock in a terminal coding agent
the session was not stuck because `git` was slow; it was stuck because the backend was pushing too many large updates through a small Windows pipe

2026-04-11
Why agents need memory and runtime framing
it is not; an agent needs both conversation memory and a separate runtime-framing baseline for the next run

2026-04-04
What compaction should preserve
between runs, compaction decides what past survives, but a long-running coding agent also needs a separate runtime-framing baseline so the next run starts under the right conditions

2026-04-02

all logs →

Elsewhere

GitHub LinkedIn

Want to talk about AI evaluation, agent runtimes, or anything above? Email's on my LinkedIn.