Skip to content

last_test_failures

Return a structured view of the failures from the most recent run_tests call in this session. Saves the agent from re-parsing raw runner output (go test FAIL: TestFoo, pytest tracebacks, jest diffs, cargo panic formatting) after every run.

The sandbox parses the runner’s machine-readable output once, at run_tests time, and stores the result in a process-scoped singleton. last_test_failures reads from that slot.

ParamTypeRequiredDefault
limitnumberno50 (clamped to 500)
  1. If no run_tests call has been made yet in this session → text result no run_tests call yet in this session.
  2. If the last run_tests call’s detector has no structured-failures parser (v1: every language except Go) → no structured failures available for <language> (last run_tests Ns ago).
  3. If the last run had zero failures → last run_tests had no failures (<N> tests passed, <lang>, <duration> ago).
  4. Otherwise → a block-per-failure listing:
2 test failure(s) from last run_tests call (go, 3s ago):
1. example.com/pkg/foo/TestValidate/empty_input
internal/foo/foo.go:42
expected error for empty input, got nil
2. example.com/pkg/bar/TestBaz
internal/bar/bar_test.go:87
mismatch
--- diff ---
got: 5
want: 3

Each block carries the fully qualified TestName, the file:line where available, the message, and an optional diff when the runner emitted got: / want: / Diff: markers.

  • Go — parses go test -json test2json events. run_tests automatically uses -json -count=1 ./... for Go projects.
  • Node / Python / Rust — parser not yet wired. The tool surfaces a clear “not supported for ” notice; fall back to reading raw run_tests output.

The contract is the language-support model’s Detector.ParseTestFailures method — each language’s implementation lands in a subsequent PR.

  • Single slot, session-scoped. Only the most recent run is retained. Agents generally care about their latest run.
  • Process-local. Nothing is persisted to disk. A server restart resets the slot.
  • Written on every run_tests call, whether tests passed or failed — so “no failures” is distinguishable from “no call yet”.
  • No previous run_tests call → text result with the no run_tests call yet message (NOT an error).
  • Unsupported language → text result with the not supported for <language> message.
  • Store not configured (only reachable via direct handler construction in tests) → error result.