run_failing_tests
Rerun only the tests that failed in the most recent run_tests call in this session. Skips anything that passed, so the fix-rerun loop drops from minutes to seconds on large codebases.
The sandbox composes a narrow test-runner invocation from the structured failure records it already captured for last_test_failures. After the rerun completes, the structured-failure store is overwritten with the fresh results — a follow-up last_test_failures reflects the new state.
Schema
Section titled “Schema”| Param | Type | Required | Default |
|---|---|---|---|
limit | number | no | 50 (clamped to 200) — max distinct test names in the rerun filter |
timeout | number | no | 300s (max 1800s) — same semantics as run_tests |
Behaviour
Section titled “Behaviour”- If no
run_testscall has been made yet in this session → text resultno run_tests call yet in this session. - If no supported project is detected in the workspace root → error result.
- If the detector has no structured-failures parser (v1: every language except Go) → text result
run_failing_tests: <language> detector has no structured failures; rerun run_tests manually. - If the last run had zero failures → text result
last run_tests had no failures — nothing to rerun (<language>). - Otherwise the sandbox composes a Go
-runregex from the stored failures and executes it. Output shape matchesrun_tests(stdout, optional stderr section, optional timeout line, trailingexit: N).
Language support
Section titled “Language support”- Go — composes
go test -json -count=1 -run '^(TestA|TestB|...)$' <packages>. Packages are the set of import paths from the failure records; when the set is ≤ 10 distinct packages they’re passed positionally, otherwise the invocation falls back to./...to keep argv bounded. Subtest failures rerun the parent test (Go’s-runregex matches the top-level name). - Node / Python / Rust — parser not yet wired. The tool surfaces a
not supported for <language>notice; rerunrun_testsdirectly.
Examples
Section titled “Examples”After a failing run:
{"tool": "run_tests"}→ 3 failures: pkg/a:TestAlpha, pkg/a:TestBeta, pkg/b:TestGammaRerun only those:
{"tool": "run_failing_tests"}The composed invocation is go test -json -count=1 -run '^(TestAlpha|TestBeta|TestGamma)$' pkg/a pkg/b.
Limit the filter (useful when dozens of tests failed and you want to focus on the first batch):
{"tool": "run_failing_tests", "arguments": {"limit": 5}}Store semantics
Section titled “Store semantics”- Reads the same
TestResultStoreslotlast_test_failuresreads from. - Writes the slot on completion, so a follow-up
last_test_failuressurfaces the rerun’s results — not the original run’s. - Session-scoped. Process-local; no persistence across restarts.
Related
Section titled “Related”- run_tests — produces the failure records this tool reruns.
- last_test_failures — shares the same storage slot.
- Language support model — the
Detector.ParseTestFailurescontract that gates rerun support.