Benchmarks¶
ca9 benchmark claims should be reproducible from public commands and pinned inputs. Keep benchmark pages tied to fixtures, reports, coverage files, and exact command lines.
Recommended benchmark shape¶
Each benchmark should record:
- Repository or fixture name.
- Dependency manifest and lockfile state.
- SCA source: OSV scan, Snyk, Dependabot, Trivy, or pip-audit.
- Coverage command and coverage percentage.
- ca9 command and version.
- Total, reachable, unreachable, and inconclusive counts.
- Notes explaining why unreachable findings are considered noise.
Real-Repo Validation Gate¶
Before publishing reachability or supply-chain claims, run the pinned public-repo validation harness:
The harness clones public repositories at fixed commits, runs ca9 inventory and
ca9 scan --no-auto-coverage, writes per-case JSON artifacts, and fails on safety
regressions. It is not a marketing benchmark. It is a guardrail for claims that ca9
does not leak ambient environment packages into repository scans and does not emit
unsafe unreachable suppressions when evidence is incomplete.
Current validation set:
| Case | Commit | Purpose | Expected contract |
|---|---|---|---|
| Flask | 954f5684e4841aad84a8eec7ace7b81a0d3f6831 |
Real Python project with resolvable dependency inventory | Inventory resolves from repo evidence with no environment fallback |
| Django REST Framework | 7433faa98f27c200e34c04586c20024d4d6aa935 |
Real Python project with unresolved dependency versions | Scan skips unresolved versions instead of using the ambient environment |
| SafeDep vet | d4491496daec6f445803a039524ddab714be01b2 |
Real non-Python supply-chain scanner repository | Scan reports no Python CVEs from the current environment |
| PinTrace | 04b343779b49faf1691823a225858ef93c52c747 |
Real Python repo with pinned vulnerable dependencies | Vulnerable imports remain inconclusive without coverage, not suppressed as unreachable |
Latest local run:
| Case | Inventory Packages | Scan Total | Reachable | Unreachable | Inconclusive | Safety Result |
|---|---|---|---|---|---|---|
| Flask | 8 | 0 | 0 | 0 | 0 | Pass |
| Django REST Framework | 1 | 0 | 0 | 0 | 0 | Pass |
| SafeDep vet | 0 | 0 | 0 | 0 | 0 | Pass |
| PinTrace | 11 | 2 | 0 | 0 | 2 | Pass |
Do not interpret this table as proof that ca9 catches every vulnerability. The stronger
claim is narrower: when ca9 lacks enough evidence, it should prefer INCONCLUSIVE over
an unsafe UNREACHABLE verdict.
Demo benchmark¶
The repository includes a demo app designed to show dependency noise reduction.
Or run the core commands directly:
coverage run -m pytest
coverage json -o coverage.json
ca9 scan --repo . --coverage coverage.json --show-confidence
Reporting template¶
Use this table when adding benchmark results:
| Benchmark | SCA source | Coverage | Total | Reachable | Unreachable | Inconclusive | Command |
|---|---|---|---|---|---|---|---|
| Demo Flask app | OSV | coverage.json |
TBD | TBD | TBD | TBD | ca9 scan --repo demo --coverage demo/coverage.json |
Do not publish a benchmark number without the command needed to reproduce it.