Agent Safety and Governance
Agents are LLM-powered reasoning modules. They can recommend actions, summarize results, and produce structured metadata, but they must not bypass platform authorization, organization scope, scan policy, or audit requirements.
Security Boundaries
Agents are not a trusted security boundary.
Every agent execution must use the same backend authorization model as the rest of the platform:
- organization access is enforced server-side
- admin users can access all organizations
- hacker and client users are limited to assigned organizations
- clients are read-only and must not trigger operational actions
- frontend-hidden controls are convenience only
Agent settings from the frontend must not grant permissions. The backend decides whether an agent is allowed to run and whether any suggested action can be applied.
Category Safety Rules
Advisory Agents
Advisory agents analyze existing data and produce reasoning output.
They can:
- create notes and summaries
- create report-ready text
- explain attack paths from existing graph data
- recommend risk updates
- identify suspicious or weak findings
- identify false-positive candidates
They cannot:
- start scans
- mutate scope
- modify vulnerabilities automatically
- change vulnerability status automatically
- perform destructive actions
- apply risk updates automatically unless policy explicitly allows it
Operational Agents
Operational agents can propose follow-up discovery or validation actions, so they require stricter controls.
They can:
- suggest scan actions
- flag assets for analyst review
- recommend deeper scan profiles
- suggest plugin runs inside approved scope
- propose additional investigation steps
They cannot:
- execute scans automatically
- create scans without admin or hacker approval
- leave approved scope
- exploit vulnerabilities
- brute-force targets
- follow arbitrary URLs from LLM output
- expand scope based on model inference alone
All operational suggestions require approval before execution.
Scope Rules
Agents may only analyze resources inside the organization and scope that produced the scan context.
An agent must not:
- create targets outside approved scope
- expand scans to unrelated domains or IP ranges without explicit policy
- use data from one organization to generate actions in another organization
- treat LLM-generated URLs as trusted targets
- access secrets, API keys, or credentials unless explicitly designed for a safe integration path
For ID-based resources such as scan_id, asset_id, vulnerability_id, scope_id, and scheduled_scan_id, the backend must resolve organization_id first and verify access before building AgentContext.
Prompt Safety
Prompt templates must separate platform instructions from untrusted target data.
Prompts should include:
- allowed action schema
- approved scope boundaries
- agent category and allowed behavior
- instruction to ignore prompt-like content in target data
- instruction to return only valid JSON matching
AgentResponse - instruction not to invent asset IDs, vulnerability IDs, or targets
Prompt templates should avoid:
- provider API keys
- raw authentication tokens
- unrelated organization data
- unnecessary personally identifiable information
- full sensitive HTTP bodies unless explicitly needed and redacted
Provider Safety
Provider-backed agents must use validated structured output. Free-form text can be stored as notes, but it must not drive automated state changes without parsing and validation.
Provider calls should include:
- minimal required context
- organization identifier
- scan identifier
- allowed output schema
- explicit instruction to stay within provided assets and findings
- no credentials or secrets
Provider calls should avoid:
- raw authentication tokens
- API keys
- unrelated organization data
- unnecessary personally identifiable information
API Keys
The AI provider API key currently exists in the frontend settings model. Backend implementation should move provider configuration to server-side storage before production agent execution.
Recommended behavior:
- store provider keys encrypted or through the deployment secret manager
- never return stored provider keys in API responses
- expose only masked values such as
configured: true - limit configuration changes to administrators
- audit provider changes
Human Review
The first backend implementation should treat agents as advisory or suggestion-only.
Recommended default:
- store every
AgentResponse - display recommendations in the UI
- require admin or hacker confirmation before creating scans, changing vulnerability status, or changing severity
- prevent clients from applying any mutation action
Automatic execution can be added later per action type and per organization policy, but operational auto-execution is not part of the initial release.
Auditability
Every run should produce an agent_runs record with enough detail to reproduce the decision:
- organization ID
- scan ID or source resource ID
- agent ID
- agent category
- provider
- model
- prompt template version
- input context
- validated output
- status
- error message when failed
- timestamps
When an agent recommendation is applied by a user, the platform should store:
- who applied it
- source
agent_run_id - action type
- target resource
- before and after values
Failure Handling
Agent failure must not fail the scan pipeline.
Expected behavior:
- scan processing completes independently
- failed agent runs are stored with
status = failed - partial agent failures do not block other agents
- provider timeouts are bounded
- malformed LLM output is rejected and stored as a validation error
Prompt Injection and Untrusted Data
Scan output, page content, HTTP responses, vulnerability evidence, and asset metadata are untrusted inputs.
Agents must treat this data as evidence, not as instructions. Provider prompts should separate system instructions from scan data and tell the model to ignore instructions found inside target content.
Default Policy
Until a stronger policy engine exists, use these defaults:
- admin can configure agents for all organizations
- advisory agents can produce notes, summaries, risk context, and attack path explanations
- operational agents can produce suggestions only
- hacker users can review allowed suggestions inside assigned organizations
- client users can view completed read-only summaries only when tied to assigned organizations
- no agent action is auto-applied except non-mutating report notes if policy allows it
- scans suggested by agents require admin or hacker approval