Architecture Reference
Data model
Core entities
Organization
├── has many: Scopes (pending → approved → rejected)
├── has many: Scans
├── has many: Assets
│ └── has many: AssetEdges (directed graph)
├── has many: Vulnerabilities (attached to Assets)
├── has many: ExposureChanges (timeline of attack-surface changes)
├── has many: Files (metadata for S3/MinIO objects)
└── has many: Members (User × role)
Asset graph
Assets are stored in assets table, relationships in asset_edges:
assets (id, org_id, type, value, normalized_value, …)
asset_edges (id, org_id, from_asset_id, to_asset_id, relation_type, …)
Supported relation types:
owns, contains, resolves_to, points_to, exposes, serves,
has_path, has_parameter, has_certificate, has_vulnerability,
discovered_by, related_to, generated_candidate, announces, discovered_url
uses_technology is legacy/deprecated. New scans store technologies in asset metadata instead of creating technology nodes or edges.
Canonical EASM graph chain
The backend relation builder keeps attack-surface topology in this order:
domain -> subdomain -> ip -> service -> url -> vulnerability
Allowed branches are:
url -> pathurl -> discovered_urlservice/url/subdomain/domain -> certificate, preferring service when host/IP/port metadata is availableasn -> cidr- organization/owner nodes to ASN where supported by plugin metadata
HTTP probing is service-first: when a URL has IP and port metadata, the worker finds or creates the canonical service asset ip:port/tcp, creates ip -[exposes]-> service, then creates service -[serves]-> url. It does not create direct ip -> url or subdomain -> url edges when a service can be identified. If no IP/service is known, the worker may create a degraded host serves URL edge until later DNS/port data enriches the graph.
| Plugin | Output | Required metadata | Created relation |
|---|---|---|---|
subfinder, amass |
subdomain | parent_domain |
domain contains subdomain |
alterx |
candidate subdomain | parent_domain, candidate=true |
parent generated_candidate candidate |
dnsx, resolver, shuffledns |
ip | domain or host, record_type when available |
host resolves_to ip |
naabu, nmap |
service | ip, port, protocol |
ip exposes service |
httpx |
url | host, ip, port, scheme when available |
service serves url; degraded host serves url only without service data |
httpx_screenshot |
file artifact | linked URL asset value | no topology edge |
tlsx |
certificate | host, ip, port when available |
preferred source has_certificate certificate |
katana |
path, discovered URL | parent_url, URL host/port metadata |
url has_path path; url discovered_url url |
nuclei |
vulnerability record | matched_url or target asset metadata |
vulnerability stored against URL/service asset |
asnmap |
asn, cidr | asn, org |
asn announces cidr; org owns asn |
Technology metadata
Technologies are descriptive metadata, not first-class graph assets. URL and service assets can carry metadata.technologies, for example:
{
"technologies": ["nginx", "React", "Cloudflare"],
"title": "Example",
"status_code": 200
}
This keeps the graph focused on reachable assets and relations while still showing stack details in asset detail panels. Legacy technology assets may remain in old databases, but API list and graph responses filter them out by default.
Deduplication
Assets are keyed on (organization_id, type, normalized_value). Normalisation rules:
| Type | Rule |
|---|---|
| domain / subdomain | lowercase, strip trailing dot |
| ip | canonical IP |
| url | lowercase, strip default ports, strip trailing slash |
| service | ip:port/protocol |
Vulnerability lifecycle
new → confirmed → fixed ↔ retest_required
new → false_positive
confirmed → accepted_risk
any → reopened (from fixed/false_positive/accepted_risk)
Risk score formula:
risk_score = CVSS_score (or severity-based default)
+ exposure_bonus (+1 for internet-facing)
+ asset_criticality_bonus (e.g. +1.5 for admin panels)
+ agent_adjustment
(capped at 10)
Files and object storage
Scan-generated files are stored in S3-compatible object storage. Local Docker Compose uses MinIO. PostgreSQL stores metadata in the files table; binary content stays in object storage. API download endpoints return short-lived presigned URLs after organization-scoped RBAC checks.
Plugins return PluginResult.Artifacts with local temporary paths. The worker uploads those artifacts through the Files service and associates them with organization, scan, scan job, source plugin, and asset where possible.
Object key format:
organizations/{org_id}/scans/{scan_id}/{file_type}/{uuid}-{safe_filename}
organizations/{org_id}/files/{file_type}/{uuid}-{safe_filename}
See files.md for storage configuration and API details.
Scan pipeline
1. User creates scan (POST /api/v1/organizations/{id}/scans)
2. API creates Scan record (status=pending)
3. API pushes QueueMessage to Redis list easm:scan:queue
4. Worker pops message, updates Scan to running
5. For each plugin in profile:
a. Create ScanJob record
b. Execute plugin.Run() (CLI subprocess or mock)
c. Parse NormalizedEntities from result
d. Upload PluginResult.Artifacts to S3/MinIO through Files service
e. Upsert assets to DB (deduplication via ON CONFLICT)
f. Vulnerability entities → Vulnerabilities table
g. Update ScanJob status
6. Update Scan to success / failed
Plugin system
Every scanner implements the Plugin interface:
type Plugin interface {
Name() string
Type() PluginType
Version() string
Run(ctx context.Context, input PluginInput, config PluginConfig) (*PluginResult, error)
}
PluginResult contains []NormalizedEntity — the common schema for all outputs:
{
"entity_type": "asset",
"asset_type": "subdomain",
"value": "api.example.com",
"source_plugin": "subfinder",
"confidence": 0.95,
"metadata": { ... }
}
Vulnerability entities use entity_type: "vulnerability" and carry title, severity, template_id etc. in metadata.
Core plugin contracts live in backend/internal/plugins: model types, registry, shared command helpers, and result/artifact contracts. Concrete tool integrations live in backend/internal/plugins/wrappers, where contributors add or update wrappers such as httpx, nmap, dnsx, and katana. Default wrapper registration is centralized in backend/internal/plugins/wrappers/defaults.go and used by both API and worker startup.
Technology is metadata, not a graph asset. Plugins should write metadata.technologies on URL/service assets; asset persistence normalizes legacy tech and technology keys into that canonical array.
Adding a new plugin
- Create
backend/internal/plugins/wrappers/myplugin.go - Implement
plugins.Plugin - Register in
backend/internal/plugins/wrappers/defaults.go - Add toolinstaller config if the wrapper calls an external CLI
- Add to relevant
scan_profilesentries inbackend/configs/config.yaml
Test mode
Set EASM_TEST_MODE=true. Each plugin checks config.Options["test_mode"] and returns hardcoded mock assets/vulns instead of launching real binaries. Useful for UI development and integration tests.
Queue & retry model
Redis list: easm:scan:queue (LPUSH producer, BRPOP consumer)
Worker uses BRPop with 5 s timeout, runs each scan in a goroutine.
Job statuses: queued → running → success / failed / timeout / cancelled
Retry policy: configured per plugin (PluginConfig.Retry). The worker does not automatically retry on failure. Failed or timed-out scan jobs can be retried manually through POST /api/v1/scan-jobs/{job_id}/retry; the retry creates a new scan job row and runs only the selected plugin using reconstructed organization/scan scope.
API key security
Keys are never stored in plaintext. The generation flow:
- Generate 32 random bytes → hex encode → prefix with
easm_ - SHA-256 hash stored in DB (
key_hash) - First 12 chars stored as
key_prefixfor display - Raw key returned to user once (not stored)
Validation: rehash the provided key, lookup by hash.
RBAC
| Role | Capabilities |
|---|---|
admin |
Full system access, scope approval, user management, all orgs |
hacker |
Assigned orgs only, run scans, triage vulnerabilities |
client |
Read-only: dashboard, approved vulnerabilities, reports |
Role is stored in JWT claims and checked by RequireRole middleware.
Scope seed assets
Approved organization scopes are also represented as seed assets so they appear in the asset inventory and graph before any scan runs.
When a scope is approved, the backend upserts seed assets with:
{
"source_plugin": "scope",
"confidence": 1.0,
"metadata": {
"source": "scope",
"scope_id": "...",
"seed": true
}
}
Supported seed mappings:
| Scope type | Seed asset behavior |
|---|---|
domain |
Creates a domain asset |
url |
Creates a url asset |
ip |
Creates an ip asset |
cidr |
Creates a cidr asset; expands to IP assets only for /28 or smaller, capped at 256 IPs |
ip_range |
Creates an ip_range asset; expands to IP assets only when the range has 256 IPs or fewer |
asn |
Creates an asn asset |
org_name |
Does not create a seed asset because there is no dedicated organization-name asset type |
Large CIDR ranges are intentionally not expanded into thousands of IP nodes. Scanner plugins receive the original scope and can handle expansion later.
Exposure changes
Exposure changes are stored in exposure_changes and provide a read-only timeline of newly observed attack-surface events. The MVP records events when assets, vulnerabilities, and file artifacts are newly inserted, plus vulnerability status changes.
The subsystem is best-effort: failures to write exposure changes do not fail scans, vulnerability updates, or file storage. Duplicate prevention is based on source insert detection and a lightweight service-level similar-event check.
See changes.md for schema, event types, API endpoints, limitations, and roadmap.
Update Center
The admin Update Center checks GitHub Releases and reports whether a newer release is available. It is informational only: the application does not execute update commands, access Docker, or modify local files.
See updates.md for configuration, API endpoints, UI behavior, and manual update commands.