Parser Architecture¶
ca9 uses a protocol-based parser system for SCA report parsing. Each parser converts a vendor-specific JSON shape into ca9's shared Vulnerability model.
Design¶
classDiagram
class SCAParser {
<<protocol>>
+can_parse(data: Any) bool
+parse(data: Any) list~Vulnerability~
}
class SnykParser
class DependabotParser
class TrivyParser
class PipAuditParser
SCAParser <|.. SnykParser
SCAParser <|.. DependabotParser
SCAParser <|.. TrivyParser
SCAParser <|.. PipAuditParser
The SCAParser protocol defines two methods:
can_parse(data)returnsTruewhen the parser recognizes the input structure.parse(data)converts raw JSON into a list ofVulnerabilityobjects.
Auto-detection¶
import json
from pathlib import Path
from ca9.parsers import detect_parser
report_path = Path("report.json")
parser = detect_parser(report_path)
vulnerabilities = parser.parse(json.loads(report_path.read_text()))
detect_parser() loads the JSON file and tries registered parser classes in order. The first parser whose can_parse() method returns True is selected.
Current registry:
Built-in parsers¶
Snyk¶
Detects Snyk reports by checking for a vulnerabilities array in a single-project report or an array of project objects containing vulnerabilities.
Dependabot¶
Detects GitHub Dependabot alert exports by checking for alert objects with security_advisory and dependency fields.
Trivy¶
Detects Trivy JSON reports and extracts vulnerability entries from result sections.
pip-audit¶
Detects pip-audit JSON reports and extracts package vulnerability entries.
Adding a new parser¶
- Create
src/ca9/parsers/mytool.py. - Implement
can_parse()andparse(). - Register the parser class in
src/ca9/parsers/__init__.py.
from typing import Any
from ca9.models import Vulnerability
class MyToolParser:
def can_parse(self, data: Any) -> bool:
return isinstance(data, dict) and "my_tool_version" in data
def parse(self, data: Any) -> list[Vulnerability]:
results = []
for item in data["findings"]:
results.append(
Vulnerability(
id=item["id"],
package_name=item["package"],
package_version=item["version"],
severity=item.get("severity", "unknown"),
title=item.get("title", ""),
description=item.get("description", ""),
)
)
return results
Parser order matters because the first matching parser wins.