Architecture Reference

Data model

Core entities

Organization
  ├── has many: Scopes (pending → approved → rejected)
  ├── has many: Scans
  ├── has many: Assets
  │     └── has many: AssetEdges (directed graph)
  ├── has many: Vulnerabilities (attached to Assets)
  ├── has many: ExposureChanges (timeline of attack-surface changes)
  ├── has many: Files (metadata for S3/MinIO objects)
  └── has many: Members (User × role)

Asset graph

Assets are stored in assets table, relationships in asset_edges:

assets        (id, org_id, type, value, normalized_value, criticality, …)
asset_edges   (id, org_id, from_asset_id, to_asset_id, relation_type, …)

Supported relation types: owns, contains, resolves_to, points_to, exposes, serves, has_path, has_parameter, has_certificate, has_vulnerability, discovered_by, related_to, generated_candidate, announces, discovered_url

uses_technology is legacy/deprecated. New scans store technologies in asset metadata instead of creating technology nodes or edges.

Canonical EASM graph chain

The backend relation builder keeps attack-surface topology in this order:

domain -> subdomain -> ip -> service -> url -> vulnerability

Allowed branches are:

url -> path
url -> discovered_url
service/url/subdomain/domain -> certificate, preferring service when host/IP/port metadata is available
asn -> cidr
organization/owner nodes to ASN where supported by plugin metadata

HTTP probing is service-first: when a URL has IP and port metadata, the worker finds or creates the canonical service asset ip:port/tcp, creates ip -[exposes]-> service, then creates service -[serves]-> url. It does not create direct ip -> url or subdomain -> url edges when a service can be identified. If no IP/service is known, the worker may create a degraded host serves URL edge until later DNS/port data enriches the graph.

Plugin	Output	Required metadata	Created relation
`subfinder`, `amass`	subdomain	`parent_domain`	domain `contains` subdomain
`alterx`	candidate subdomain	`parent_domain`, `candidate=true`	parent `generated_candidate` candidate
`dnsx`, `resolver`, `shuffledns`	ip	`domain` or `host`, `record_type` when available	host `resolves_to` ip
`naabu`, `nmap`	service	`ip`, `port`, `protocol`	ip `exposes` service
`httpx`	url	`host`, `ip`, `port`, `scheme` when available	service `serves` url; degraded host `serves` url only without service data
`httpx_screenshot`	file artifact	linked URL asset value	no topology edge
`tlsx`	certificate	`host`, `ip`, `port` when available	preferred source `has_certificate` certificate
`katana`	path, discovered URL	`parent_url`, URL host/port metadata	url `has_path` path; url `discovered_url` url
`nuclei`	vulnerability record	`matched_url` or target asset metadata	vulnerability stored against URL/service asset
`asnmap`	asn, cidr	`asn`, `org`	asn `announces` cidr; org `owns` asn

Technology metadata

Technologies are descriptive metadata, not first-class graph assets. URL and service assets can carry metadata.technologies, for example:

{
  "technologies": ["nginx", "React", "Cloudflare"],
  "title": "Example",
  "status_code": 200
}

This keeps the graph focused on reachable assets and relations while still showing stack details in asset detail panels. Legacy technology assets may remain in old databases, but API list and graph responses filter them out by default.

Deduplication

Assets are keyed on (organization_id, type, normalized_value). Normalisation rules:

Type	Rule
domain / subdomain	lowercase, strip trailing dot
ip	canonical IP
url	lowercase, strip default ports, strip trailing slash
service	`ip:port/protocol`

Vulnerability lifecycle

new → confirmed → fixed ↔ retest_required
new → false_positive
confirmed → accepted_risk
any → reopened (from fixed/false_positive/accepted_risk)

Asset criticality is a manual business-importance field on assets, with values unknown, low, medium, high, and critical. It defaults to unknown, is updated only by users with admin or hacker role, and is not inferred from scanner output or AI agents in the MVP. It is intended as a foundation for future risk scoring, but no risk-score engine or automatic criticality assignment is implemented here.

Files and object storage

Scan-generated files are stored in S3-compatible object storage. Local Docker Compose uses MinIO. PostgreSQL stores metadata in the files table; binary content stays in object storage. API download endpoints return short-lived presigned URLs after organization-scoped RBAC checks.

Plugins return PluginResult.Artifacts with local temporary paths. The worker uploads those artifacts through the Files service and associates them with organization, scan, scan job, source plugin, and asset where possible.

Object key format:

organizations/{org_id}/scans/{scan_id}/{file_type}/{uuid}-{safe_filename}
organizations/{org_id}/files/{file_type}/{uuid}-{safe_filename}

See files.md for storage configuration and API details.

Research Layer

The platform architecture includes a research knowledge layer:

platform code
plugins
configuration
hxresearch knowledge repository

hxresearch/ is intended to become the long-term repository of internal security expertise for the project. It is separate from application code and can contain custom detection templates, proprietary research, advisories, writeups, proof-of-concepts, datasets, and future expert knowledge.

Planned structure:

hxresearch/
├── README.md
├── nuclei/
├── advisories/
├── writeups/
├── poc/
└── datasets/

Worker containers mount hxresearch/ read-only at /opt/hxeasm/hxresearch. The nuclei_custom_templates plugin can run templates from /opt/hxeasm/hxresearch/nuclei when an operator adds that plugin to a custom scan profile. It is not included in built-in profiles.

Future detection capabilities may add a hybrid Nuclei mode, advisory engines, or agent context readers that consume other hxresearch directories.

See hxresearch.md for the canonical documentation.

Scan pipeline

1. User creates scan (POST /api/v1/organizations/{id}/scans)
2. API creates Scan record (status=pending)
3. API pushes QueueMessage to Redis list  easm:scan:queue
4. Worker pops message, updates Scan to running
5. For each plugin in profile:
   a. Create ScanJob record
   b. Execute plugin.Run()  (CLI subprocess or mock)
   c. Parse NormalizedEntities from result
   d. Upload PluginResult.Artifacts to S3/MinIO through Files service
   e. Upsert assets to DB  (deduplication via ON CONFLICT)
   f. Vulnerability entities → Vulnerabilities table
   g. Update ScanJob status
6. Update Scan to success / failed

Manual Plugin Execution

Users with admin or hacker role can launch one compatible plugin against one selected asset from the Assets page or Asset Graph. The API endpoint is:

POST /api/v1/assets/{asset_id}/run-plugin

The backend resolves the asset organization, validates RBAC, validates that the plugin is enabled, validates manual_scan support from registry metadata, validates supported asset types, creates a normal scan with internal profile manual_scan, creates exactly one queued scan job, and pushes the existing Redis scan queue. The worker processes it with the normal plugin execution, result persistence, file artifact, graph, vulnerability, and exposure-change paths.

The frontend discovers manual plugins through GET /api/v1/plugins/manual-capabilities; the UI does not keep a separate plugin allowlist. Adding a registered plugin with SupportedExecutionModes: [manual_scan] and SupportedAssetTypes makes it available automatically.

Manual scans differ from normal profile scans only in target selection: the plugin input contains the selected asset only. Before queue execution, manual targets are normalized. Services are converted from stored graph form such as 1.2.3.4:443/tcp to execution form 1.2.3.4:443.

Plugin system

Every scanner implements the Plugin interface:

type Plugin interface {
    Name() string
    Type() PluginType
    Version() string
    Run(ctx context.Context, input PluginInput, config PluginConfig) (*PluginResult, error)
}

PluginResult contains []NormalizedEntity — the common schema for all outputs:

{
  "entity_type": "asset",
  "asset_type": "subdomain",
  "value": "api.example.com",
  "source_plugin": "subfinder",
  "confidence": 0.95,
  "metadata": { ... }
}

Vulnerability entities use entity_type: "vulnerability" and carry title, severity, template_id etc. in metadata.

Core plugin contracts live in backend/internal/plugins: model types, registry, shared command helpers, and result/artifact contracts. Concrete tool integrations live in backend/internal/plugins/wrappers, where contributors add or update wrappers such as httpx, nmap, dnsx, and katana. Default wrapper registration is centralized in backend/internal/plugins/wrappers/defaults.go and used by both API and worker startup.

Technology is metadata, not a graph asset. Plugins should write metadata.technologies on URL/service assets; asset persistence normalizes legacy tech and technology keys into that canonical array.

Adding a new plugin

Create backend/internal/plugins/wrappers/myplugin.go
Implement plugins.Plugin
Register in backend/internal/plugins/wrappers/defaults.go
Set SupportedAssetTypes and SupportedExecutionModes
Add toolinstaller config if the wrapper calls an external CLI
Add to relevant scan_profiles entries in backend/configs/config.yaml for profile scans

If SupportedExecutionModes includes manual_scan and the plugin is enabled, it appears in GET /api/v1/plugins/manual-capabilities and in the frontend Run Plugin menu without frontend code changes.

Test mode

Set EASM_TEST_MODE=true. Each plugin checks config.Options["test_mode"] and returns hardcoded mock assets/vulns instead of launching real binaries. Useful for UI development and integration tests.

Queue & retry model

Redis list:  easm:scan:queue  (LPUSH producer, BRPOP consumer)

Worker uses BRPop with 5 s timeout, runs each scan in a goroutine.

Job statuses: queued → running → success / failed / timeout / cancelled

Retry policy: configured per plugin (PluginConfig.Retry). The worker does not automatically retry on failure. Failed or timed-out scan jobs can be retried manually through POST /api/v1/scan-jobs/{job_id}/retry; the retry creates a new scan job row and runs only the selected plugin using reconstructed organization/scan scope.

API key security

Keys are never stored in plaintext. The generation flow:

Generate 32 random bytes → hex encode → prefix with easm_
SHA-256 hash stored in DB (key_hash)
First 12 chars stored as key_prefix for display
Raw key returned to user once (not stored)

Validation: rehash the provided key, lookup by hash.

RBAC

Role	Capabilities
`admin`	Full system access, scope approval, user management, all orgs
`hacker`	Assigned orgs only, run scans, triage vulnerabilities
`client`	Read-only: dashboard, approved vulnerabilities, reports

Role is stored in JWT claims and checked by RequireRole middleware.

Scope seed assets

Approved organization scopes are also represented as seed assets so they appear in the asset inventory and graph before any scan runs.

When a scope is approved, the backend upserts seed assets with:

{
  "source_plugin": "scope",
  "confidence": 1.0,
  "metadata": {
    "source": "scope",
    "scope_id": "...",
    "seed": true
  }
}

Supported seed mappings:

Scope type	Seed asset behavior
`domain`	Creates a `domain` asset
`url`	Creates a `url` asset
`ip`	Creates an `ip` asset
`cidr`	Creates a `cidr` asset; expands to IP assets only for `/28` or smaller, capped at 256 IPs
`ip_range`	Creates an `ip_range` asset; expands to IP assets only when the range has 256 IPs or fewer
`asn`	Creates an `asn` asset
`org_name`	Does not create a seed asset because there is no dedicated organization-name asset type

Large CIDR ranges are intentionally not expanded into thousands of IP nodes. Scanner plugins receive the original scope and can handle expansion later.

Exposure changes

Exposure changes are stored in exposure_changes and provide a read-only timeline of newly observed attack-surface events. The MVP records events when assets, vulnerabilities, and file artifacts are newly inserted, plus vulnerability status changes.

The subsystem is best-effort: failures to write exposure changes do not fail scans, vulnerability updates, or file storage. Duplicate prevention is based on source insert detection and a lightweight service-level similar-event check.

See changes.md for schema, event types, API endpoints, limitations, and roadmap.

Update Center

The admin Update Center checks GitHub Releases and reports whether a newer release is available. It is informational only: the application does not execute update commands, access Docker, or modify local files.

See updates.md for configuration, API endpoints, UI behavior, and manual update commands.

Report Rendering

Report generation uses embedded templates under backend/internal/reports/templates/. HTML reports render the embedded CSS template directly. PDF reports use the same deterministic view model and the existing lightweight native PDF writer for styled sections, tables, findings, and recommendations. JSON and CSV outputs remain available for machine-readable exports.

Asset Criticality

Asset criticality records business importance separately from vulnerability severity. A low-severity issue on a critical VPN may deserve more attention than a medium issue on a test host.

Current behavior is manual-only:

allowed values: unknown, low, medium, high, critical
default: unknown
source: always manual
update roles: admin, hacker
read roles: any user with organization access
audit fields: update time and updating user when available

Future suggested criticality, AI/agent recommendations, and risk-score calculation can build on these fields, but they are not part of the MVP.