Skip to content

Architecture Reference

Data model

Core entities

Organization
  ├── has many: Scopes (pending → approved → rejected)
  ├── has many: Scans
  ├── has many: Assets
  │     └── has many: AssetEdges (directed graph)
  ├── has many: Vulnerabilities (attached to Assets)
  ├── has many: ExposureChanges (timeline of attack-surface changes)
  ├── has many: Files (metadata for S3/MinIO objects)
  └── has many: Members (User × role)

Asset graph

Assets are stored in assets table, relationships in asset_edges:

assets        (id, org_id, type, value, normalized_value, …)
asset_edges   (id, org_id, from_asset_id, to_asset_id, relation_type, …)

Supported relation types: owns, contains, resolves_to, points_to, exposes, serves, has_path, has_parameter, has_certificate, has_vulnerability, discovered_by, related_to, generated_candidate, announces, discovered_url

uses_technology is legacy/deprecated. New scans store technologies in asset metadata instead of creating technology nodes or edges.

Canonical EASM graph chain

The backend relation builder keeps attack-surface topology in this order:

domain -> subdomain -> ip -> service -> url -> vulnerability

Allowed branches are:

  • url -> path
  • url -> discovered_url
  • service/url/subdomain/domain -> certificate, preferring service when host/IP/port metadata is available
  • asn -> cidr
  • organization/owner nodes to ASN where supported by plugin metadata

HTTP probing is service-first: when a URL has IP and port metadata, the worker finds or creates the canonical service asset ip:port/tcp, creates ip -[exposes]-> service, then creates service -[serves]-> url. It does not create direct ip -> url or subdomain -> url edges when a service can be identified. If no IP/service is known, the worker may create a degraded host serves URL edge until later DNS/port data enriches the graph.

Plugin Output Required metadata Created relation
subfinder, amass subdomain parent_domain domain contains subdomain
alterx candidate subdomain parent_domain, candidate=true parent generated_candidate candidate
dnsx, resolver, shuffledns ip domain or host, record_type when available host resolves_to ip
naabu, nmap service ip, port, protocol ip exposes service
httpx url host, ip, port, scheme when available service serves url; degraded host serves url only without service data
httpx_screenshot file artifact linked URL asset value no topology edge
tlsx certificate host, ip, port when available preferred source has_certificate certificate
katana path, discovered URL parent_url, URL host/port metadata url has_path path; url discovered_url url
nuclei vulnerability record matched_url or target asset metadata vulnerability stored against URL/service asset
asnmap asn, cidr asn, org asn announces cidr; org owns asn

Technology metadata

Technologies are descriptive metadata, not first-class graph assets. URL and service assets can carry metadata.technologies, for example:

{
  "technologies": ["nginx", "React", "Cloudflare"],
  "title": "Example",
  "status_code": 200
}

This keeps the graph focused on reachable assets and relations while still showing stack details in asset detail panels. Legacy technology assets may remain in old databases, but API list and graph responses filter them out by default.

Deduplication

Assets are keyed on (organization_id, type, normalized_value). Normalisation rules:

Type Rule
domain / subdomain lowercase, strip trailing dot
ip canonical IP
url lowercase, strip default ports, strip trailing slash
service ip:port/protocol

Vulnerability lifecycle

new → confirmed → fixed ↔ retest_required
new → false_positive
confirmed → accepted_risk
any → reopened (from fixed/false_positive/accepted_risk)

Risk score formula:

risk_score = CVSS_score  (or severity-based default)
           + exposure_bonus (+1 for internet-facing)
           + asset_criticality_bonus (e.g. +1.5 for admin panels)
           + agent_adjustment
           (capped at 10)

Files and object storage

Scan-generated files are stored in S3-compatible object storage. Local Docker Compose uses MinIO. PostgreSQL stores metadata in the files table; binary content stays in object storage. API download endpoints return short-lived presigned URLs after organization-scoped RBAC checks.

Plugins return PluginResult.Artifacts with local temporary paths. The worker uploads those artifacts through the Files service and associates them with organization, scan, scan job, source plugin, and asset where possible.

Object key format:

organizations/{org_id}/scans/{scan_id}/{file_type}/{uuid}-{safe_filename}
organizations/{org_id}/files/{file_type}/{uuid}-{safe_filename}

See files.md for storage configuration and API details.

Scan pipeline

1. User creates scan (POST /api/v1/organizations/{id}/scans)
2. API creates Scan record (status=pending)
3. API pushes QueueMessage to Redis list  easm:scan:queue
4. Worker pops message, updates Scan to running
5. For each plugin in profile:
   a. Create ScanJob record
   b. Execute plugin.Run()  (CLI subprocess or mock)
   c. Parse NormalizedEntities from result
   d. Upload PluginResult.Artifacts to S3/MinIO through Files service
   e. Upsert assets to DB  (deduplication via ON CONFLICT)
   f. Vulnerability entities → Vulnerabilities table
   g. Update ScanJob status
6. Update Scan to success / failed

Plugin system

Every scanner implements the Plugin interface:

type Plugin interface {
    Name() string
    Type() PluginType
    Version() string
    Run(ctx context.Context, input PluginInput, config PluginConfig) (*PluginResult, error)
}

PluginResult contains []NormalizedEntity — the common schema for all outputs:

{
  "entity_type": "asset",
  "asset_type": "subdomain",
  "value": "api.example.com",
  "source_plugin": "subfinder",
  "confidence": 0.95,
  "metadata": { ... }
}

Vulnerability entities use entity_type: "vulnerability" and carry title, severity, template_id etc. in metadata.

Core plugin contracts live in backend/internal/plugins: model types, registry, shared command helpers, and result/artifact contracts. Concrete tool integrations live in backend/internal/plugins/wrappers, where contributors add or update wrappers such as httpx, nmap, dnsx, and katana. Default wrapper registration is centralized in backend/internal/plugins/wrappers/defaults.go and used by both API and worker startup.

Technology is metadata, not a graph asset. Plugins should write metadata.technologies on URL/service assets; asset persistence normalizes legacy tech and technology keys into that canonical array.

Adding a new plugin

  1. Create backend/internal/plugins/wrappers/myplugin.go
  2. Implement plugins.Plugin
  3. Register in backend/internal/plugins/wrappers/defaults.go
  4. Add toolinstaller config if the wrapper calls an external CLI
  5. Add to relevant scan_profiles entries in backend/configs/config.yaml

Test mode

Set EASM_TEST_MODE=true. Each plugin checks config.Options["test_mode"] and returns hardcoded mock assets/vulns instead of launching real binaries. Useful for UI development and integration tests.

Queue & retry model

Redis list:  easm:scan:queue  (LPUSH producer, BRPOP consumer)

Worker uses BRPop with 5 s timeout, runs each scan in a goroutine.

Job statuses: queued → running → success / failed / timeout / cancelled

Retry policy: configured per plugin (PluginConfig.Retry). The worker does not automatically retry on failure. Failed or timed-out scan jobs can be retried manually through POST /api/v1/scan-jobs/{job_id}/retry; the retry creates a new scan job row and runs only the selected plugin using reconstructed organization/scan scope.

API key security

Keys are never stored in plaintext. The generation flow:

  1. Generate 32 random bytes → hex encode → prefix with easm_
  2. SHA-256 hash stored in DB (key_hash)
  3. First 12 chars stored as key_prefix for display
  4. Raw key returned to user once (not stored)

Validation: rehash the provided key, lookup by hash.

RBAC

Role Capabilities
admin Full system access, scope approval, user management, all orgs
hacker Assigned orgs only, run scans, triage vulnerabilities
client Read-only: dashboard, approved vulnerabilities, reports

Role is stored in JWT claims and checked by RequireRole middleware.

Scope seed assets

Approved organization scopes are also represented as seed assets so they appear in the asset inventory and graph before any scan runs.

When a scope is approved, the backend upserts seed assets with:

{
  "source_plugin": "scope",
  "confidence": 1.0,
  "metadata": {
    "source": "scope",
    "scope_id": "...",
    "seed": true
  }
}

Supported seed mappings:

Scope type Seed asset behavior
domain Creates a domain asset
url Creates a url asset
ip Creates an ip asset
cidr Creates a cidr asset; expands to IP assets only for /28 or smaller, capped at 256 IPs
ip_range Creates an ip_range asset; expands to IP assets only when the range has 256 IPs or fewer
asn Creates an asn asset
org_name Does not create a seed asset because there is no dedicated organization-name asset type

Large CIDR ranges are intentionally not expanded into thousands of IP nodes. Scanner plugins receive the original scope and can handle expansion later.

Exposure changes

Exposure changes are stored in exposure_changes and provide a read-only timeline of newly observed attack-surface events. The MVP records events when assets, vulnerabilities, and file artifacts are newly inserted, plus vulnerability status changes.

The subsystem is best-effort: failures to write exposure changes do not fail scans, vulnerability updates, or file storage. Duplicate prevention is based on source insert detection and a lightweight service-level similar-event check.

See changes.md for schema, event types, API endpoints, limitations, and roadmap.

Update Center

The admin Update Center checks GitHub Releases and reports whether a newer release is available. It is informational only: the application does not execute update commands, access Docker, or modify local files.

See updates.md for configuration, API endpoints, UI behavior, and manual update commands.