Agent Safety and Governance

Agents are LLM-powered reasoning modules. They can recommend actions, summarize results, and produce structured metadata, but they must not bypass platform authorization, organization scope, scan policy, or audit requirements.

Security Boundaries

Agents are not a trusted security boundary.

Every agent execution must use the same backend authorization model as the rest of the platform:

organization access is enforced server-side
admin users can access all organizations
hacker and client users are limited to assigned organizations
clients are read-only and must not trigger operational actions
frontend-hidden controls are convenience only

Agent settings from the frontend must not grant permissions. The backend decides whether an agent is allowed to run and whether any suggested action can be applied.

Category Safety Rules

Advisory Agents

Advisory agents analyze existing data and produce reasoning output.

They can:

create notes and summaries
create report-ready text
explain attack paths from existing graph data
recommend risk updates
identify suspicious or weak findings
identify false-positive candidates

They cannot:

start scans
mutate scope
modify vulnerabilities automatically
change vulnerability status automatically
perform destructive actions
apply risk updates automatically unless policy explicitly allows it

Operational Agents

Operational agents can propose follow-up discovery or validation actions, so they require stricter controls.

They can:

suggest scan actions
flag assets for analyst review
recommend deeper scan profiles
suggest plugin runs inside approved scope
propose additional investigation steps

They cannot:

execute scans automatically
create scans without admin or hacker approval
leave approved scope
exploit vulnerabilities
brute-force targets
follow arbitrary URLs from LLM output
expand scope based on model inference alone

All operational suggestions require approval before execution.

Scope Rules

Agents may only analyze resources inside the organization and scope that produced the scan context.

An agent must not:

create targets outside approved scope
expand scans to unrelated domains or IP ranges without explicit policy
use data from one organization to generate actions in another organization
treat LLM-generated URLs as trusted targets
access secrets, API keys, or credentials unless explicitly designed for a safe integration path

For ID-based resources such as scan_id, asset_id, vulnerability_id, scope_id, and scheduled_scan_id, the backend must resolve organization_id first and verify access before building AgentContext.

Prompt Safety

Prompt templates must separate platform instructions from untrusted target data.

Prompts should include:

allowed action schema
approved scope boundaries
agent category and allowed behavior
instruction to ignore prompt-like content in target data
instruction to return only valid JSON matching AgentResponse
instruction not to invent asset IDs, vulnerability IDs, or targets

Prompt templates should avoid:

provider API keys
raw authentication tokens
unrelated organization data
unnecessary personally identifiable information
full sensitive HTTP bodies unless explicitly needed and redacted

Provider Safety

Provider-backed agents must use validated structured output. Free-form text can be stored as notes, but it must not drive automated state changes without parsing and validation.

Provider calls should include:

minimal required context
organization identifier
scan identifier
allowed output schema
explicit instruction to stay within provided assets and findings
no credentials or secrets

Provider calls should avoid:

raw authentication tokens
API keys
unrelated organization data
unnecessary personally identifiable information

API Keys

The AI provider API key currently exists in the frontend settings model. Backend implementation should move provider configuration to server-side storage before production agent execution.

Recommended behavior:

store provider keys encrypted or through the deployment secret manager
never return stored provider keys in API responses
expose only masked values such as configured: true
limit configuration changes to administrators
audit provider changes

Human Review

The first backend implementation should treat agents as advisory or suggestion-only.

Recommended default:

store every AgentResponse
display recommendations in the UI
require admin or hacker confirmation before creating scans, changing vulnerability status, or changing severity
prevent clients from applying any mutation action

Automatic execution can be added later per action type and per organization policy, but operational auto-execution is not part of the initial release.

Auditability

Every run should produce an agent_runs record with enough detail to reproduce the decision:

organization ID
scan ID or source resource ID
agent ID
agent category
provider
model
prompt template version
input context
validated output
status
error message when failed
timestamps

When an agent recommendation is applied by a user, the platform should store:

who applied it
source agent_run_id
action type
target resource
before and after values

Failure Handling

Agent failure must not fail the scan pipeline.

Expected behavior:

scan processing completes independently
failed agent runs are stored with status = failed
partial agent failures do not block other agents
provider timeouts are bounded
malformed LLM output is rejected and stored as a validation error

Prompt Injection and Untrusted Data

Scan output, page content, HTTP responses, vulnerability evidence, and asset metadata are untrusted inputs.

Agents must treat this data as evidence, not as instructions. Provider prompts should separate system instructions from scan data and tell the model to ignore instructions found inside target content.

Default Policy

Until a stronger policy engine exists, use these defaults:

admin can configure agents for all organizations
advisory agents can produce notes, summaries, risk context, and attack path explanations
operational agents can produce suggestions only
hacker users can review allowed suggestions inside assigned organizations
client users can view completed read-only summaries only when tied to assigned organizations
no agent action is auto-applied except non-mutating report notes if policy allows it
scans suggested by agents require admin or hacker approval