last_test_failures
Return a structured view of the failures from the most recent run_tests call in this session. Saves the agent from re-parsing raw runner output (go test FAIL: TestFoo, pytest tracebacks, jest diffs, cargo panic formatting) after every run.
The sandbox parses the runner’s machine-readable output once, at run_tests time, and stores the result in a process-scoped singleton. last_test_failures reads from that slot.
Schema
Section titled “Schema”| Param | Type | Required | Default |
|---|---|---|---|
limit | number | no | 50 (clamped to 500) |
Behaviour
Section titled “Behaviour”- If no
run_testscall has been made yet in this session → text resultno run_tests call yet in this session. - If the last
run_testscall’s detector has no structured-failures parser (v1: every language except Go) →no structured failures available for <language> (last run_tests Ns ago). - If the last run had zero failures →
last run_tests had no failures (<N> tests passed, <lang>, <duration> ago). - Otherwise → a block-per-failure listing:
2 test failure(s) from last run_tests call (go, 3s ago):
1. example.com/pkg/foo/TestValidate/empty_input internal/foo/foo.go:42 expected error for empty input, got nil
2. example.com/pkg/bar/TestBaz internal/bar/bar_test.go:87 mismatch --- diff --- got: 5 want: 3Each block carries the fully qualified TestName, the file:line where available, the message, and an optional diff when the runner emitted got: / want: / Diff: markers.
Language support
Section titled “Language support”- Go — parses
go test -jsontest2json events.run_testsautomatically uses-json -count=1 ./...for Go projects. - Node / Python / Rust — parser not yet wired. The tool surfaces a clear “not supported for
” notice; fall back to reading raw run_testsoutput.
The contract is the language-support model’s Detector.ParseTestFailures method — each language’s implementation lands in a subsequent PR.
Storage semantics
Section titled “Storage semantics”- Single slot, session-scoped. Only the most recent run is retained. Agents generally care about their latest run.
- Process-local. Nothing is persisted to disk. A server restart resets the slot.
- Written on every
run_testscall, whether tests passed or failed — so “no failures” is distinguishable from “no call yet”.
Failure modes
Section titled “Failure modes”- No previous
run_testscall → text result with theno run_tests call yetmessage (NOT an error). - Unsupported language → text result with the
not supported for <language>message. - Store not configured (only reachable via direct handler construction in tests) → error result.
Related
Section titled “Related”- run_tests — produces the input.
- Language support model — the
Detector.ParseTestFailurescontract.