Integration tests
The project ships four tiers of tests. Each tier catches a different class of regression — a change is “done” when every tier the change can plausibly break is green.
Tier 1 — Unit tests (make test)
Section titled “Tier 1 — Unit tests (make test)”Default go test ./... -race -count=1. Runs on every push and every PR via the go CI job. Covers pure-Go logic and the mock-backed LSP wire client. No external binary dependencies beyond the Go toolchain itself.
make testTier 2 — Integration tests (make test-integration)
Section titled “Tier 2 — Integration tests (make test-integration)”Files tagged //go:build integration. Drive the real external binaries the sandbox image ships:
| Package | Binary | What it catches |
|---|---|---|
internal/lsp/integration_test.go | gopls | Mock-vs-real wire drift (the original implementation returned no rename edits because gopls uses documentChanges not changes; this tier caught it) |
internal/lsp/rust_integration_test.go | rust-analyzer | Same wire-drift class, in the rust-analyzer dialect (initialize quirks, references shape, rename shape) |
internal/lsp/python_integration_test.go | pyright-langserver | Same, in the pyright dialect |
internal/lsp/node_integration_test.go | typescript-language-server (+ typescript) | Same, in the tsserver dialect |
internal/verify/integration_test.go | golangci-lint | Lint-output format changes across linter versions |
internal/verify/python_integration_test.go | ruff | Same class of drift on pythonDetector.ParseLint (F-series format) |
internal/verify/rust_integration_test.go | cargo clippy (--message-format=short) | Same class of drift on rustDetector.ParseLint (severity-tagged one-liner format) |
internal/verify/node_integration_test.go | eslint (--format=json) | Same class of drift on nodeDetector.ParseLint (eslint JSON output; the legacy --format=compact was removed from eslint v9 core) |
internal/tools/integration_test.go | go | Structured-failure parsing from live go test -json output |
Run locally:
make test-integrationBinaries the tier needs on PATH:
go(always present in a Go dev environment)gopls—go install golang.org/x/tools/gopls@latestgolangci-lintv2 —brew install golangci-lint/apt install golangci-lintruff—pip install ruffcargo clippy—rustup component add clippyeslint—npm i -g eslintrust-analyzer—rustup component add rust-analyzerpyright-langserver—npm i -g pyright(orpip install pyright)typescript-language-server—npm i -g typescript-language-server typescript
Any missing binary skips the corresponding test with a clear message; it is not a failure. That keeps the target safe to run on a partially-provisioned machine while still being meaningful when every tool is present.
CI runs this tier in a dedicated integration job that installs each binary fresh, so every detector + LSP is fully exercised on every PR rather than depending on contributor toolchains.
Tier 3 — End-to-end MCP wire (scripts/e2e-p0.sh)
Section titled “Tier 3 — End-to-end MCP wire (scripts/e2e-p0.sh)”Out-of-test-tree smoke that chains every P0 tool over the real MCP HTTP+SSE wire against a real bin/sandbox binary. This is the only tier that exercises the tool surface through the full transport — MCP initialization, JSON-RPC request / SSE response round-trips, tool dispatch, the scrub + metrics + tracing middleware stack. Mirrors scripts/e2e-demo.sh in shape.
bash scripts/e2e-p0.shRuntime ~60s on a warm Go cache. LSP steps skip cleanly when gopls isn’t on PATH; everything else is unconditional. Binaries needed: go, curl, jq, git, ripgrep, gopls (optional).
CI runs this tier as the e2e-smoke job: installs gopls + ripgrep, builds the binary, runs the script. No binary skips in CI — every LSP step is exercised.
scripts/e2e-multi-workspace.sh
Section titled “scripts/e2e-multi-workspace.sh”A focused companion to e2e-p0.sh that boots the sandbox in -workspaces=primary=A,extension=B mode and asserts every workspace-aware tool dispatches to the correct root, rejects the no-hint case with an actionable error, rejects unknown-name hints, and keeps the read-tracker per-absolute-path (one workspace’s Read cannot unlock another’s Write).
bash scripts/e2e-multi-workspace.shRuntime ~10s. Binaries needed: go, bash, curl, jq, ripgrep (Glob + Grep delegate to rg). Wired into CI as the e2e-multi-workspace job parallel to e2e-smoke.
Tier 4 — Docker image smoke (CI only)
Section titled “Tier 4 — Docker image smoke (CI only)”One job per feature layer composes Dockerfile.tools + Dockerfile.tools-<lang> onto the layer’s operator-recommended base, boots a container, opens the SSE stream, initialises an MCP session, and exercises the layer’s characteristic tools.
| Job | Base | Feature-layer binaries | What it asserts via MCP |
|---|---|---|---|
docker-integration | golang:1.25-alpine | gopls, golangci-lint | Every P0 tool name is present in tools/list |
docker-integration-node | node:22-slim | pnpm, bun | run_tests runs a node --test suite to exit 0; Bash sees pnpm + bun on PATH |
docker-integration-python | python:3.12-slim + pytest | ruff | run_lint surfaces a seeded F401 finding via MCP; run_tests runs pytest to exit 0 |
docker-integration-rust | rust:1-slim-bookworm + clippy | rust-analyzer | run_tests passes cargo test; run_lint surfaces cargo clippy output |
docker-integration-render | tools-render base (debian + dot + mmdc + Chromium) | — | render_mermaid + render_dot each write an SVG with an <svg root to the workspace volume |
This is the only tier that verifies the published image actually boots and registers its tool surface. If any of these go red, no other test tier’s green matters — the operators can’t run the thing.
The MCP handshake + tools/call boilerplate is factored into scripts/mcp-helpers.sh so each job’s inline bash stays focused on the seed + assertions.
When to run what
Section titled “When to run what”- Small refactor, no external deps touched: unit tests cover it.
- New tool, new flag, anything agent-visible: add a case to
scripts/e2e-p0.shand run it locally before pushing. - Touching the LSP client, the lint parser, or the Go test parser: the integration tier is your regression net; run
make test-integrationlocally. - Touching the MCP transport, middleware, or any tool handler’s wire contract:
e2e-smokecatches the full-stack regression. - Touching
ResolveWorkspace, the read-tracker, or any tool’sworkspaceargument plumbing:e2e-multi-workspaceis the regression net. - Dockerfile or image composition change: push the branch and let
docker-integrationgate the merge.