Skip to content

Agent Safety and Governance

Agents are LLM-powered reasoning modules. They can recommend actions, summarize results, and produce structured metadata, but they must not bypass platform authorization, organization scope, scan policy, or audit requirements.

Security Boundaries

Agents are not a trusted security boundary.

Every agent execution must use the same backend authorization model as the rest of the platform:

  • organization access is enforced server-side
  • admin users can access all organizations
  • hacker and client users are limited to assigned organizations
  • clients are read-only and must not trigger operational actions
  • frontend-hidden controls are convenience only

Agent settings from the frontend must not grant permissions. The backend decides whether an agent is allowed to run and whether any suggested action can be applied.

Category Safety Rules

Advisory Agents

Advisory agents analyze existing data and produce reasoning output.

They can:

  • create notes and summaries
  • create report-ready text
  • explain attack paths from existing graph data
  • recommend risk updates
  • identify suspicious or weak findings
  • identify false-positive candidates

They cannot:

  • start scans
  • mutate scope
  • modify vulnerabilities automatically
  • change vulnerability status automatically
  • perform destructive actions
  • apply risk updates automatically unless policy explicitly allows it

Operational Agents

Operational agents can propose follow-up discovery or validation actions, so they require stricter controls.

They can:

  • suggest scan actions
  • flag assets for analyst review
  • recommend deeper scan profiles
  • suggest plugin runs inside approved scope
  • propose additional investigation steps

They cannot:

  • execute scans automatically
  • create scans without admin or hacker approval
  • leave approved scope
  • exploit vulnerabilities
  • brute-force targets
  • follow arbitrary URLs from LLM output
  • expand scope based on model inference alone

All operational suggestions require approval before execution.

Scope Rules

Agents may only analyze resources inside the organization and scope that produced the scan context.

An agent must not:

  • create targets outside approved scope
  • expand scans to unrelated domains or IP ranges without explicit policy
  • use data from one organization to generate actions in another organization
  • treat LLM-generated URLs as trusted targets
  • access secrets, API keys, or credentials unless explicitly designed for a safe integration path

For ID-based resources such as scan_id, asset_id, vulnerability_id, scope_id, and scheduled_scan_id, the backend must resolve organization_id first and verify access before building AgentContext.

Prompt Safety

Prompt templates must separate platform instructions from untrusted target data.

Prompts should include:

  • allowed action schema
  • approved scope boundaries
  • agent category and allowed behavior
  • instruction to ignore prompt-like content in target data
  • instruction to return only valid JSON matching AgentResponse
  • instruction not to invent asset IDs, vulnerability IDs, or targets

Prompt templates should avoid:

  • provider API keys
  • raw authentication tokens
  • unrelated organization data
  • unnecessary personally identifiable information
  • full sensitive HTTP bodies unless explicitly needed and redacted

Provider Safety

Provider-backed agents must use validated structured output. Free-form text can be stored as notes, but it must not drive automated state changes without parsing and validation.

Provider calls should include:

  • minimal required context
  • organization identifier
  • scan identifier
  • allowed output schema
  • explicit instruction to stay within provided assets and findings
  • no credentials or secrets

Provider calls should avoid:

  • raw authentication tokens
  • API keys
  • unrelated organization data
  • unnecessary personally identifiable information

API Keys

The AI provider API key currently exists in the frontend settings model. Backend implementation should move provider configuration to server-side storage before production agent execution.

Recommended behavior:

  • store provider keys encrypted or through the deployment secret manager
  • never return stored provider keys in API responses
  • expose only masked values such as configured: true
  • limit configuration changes to administrators
  • audit provider changes

Human Review

The first backend implementation should treat agents as advisory or suggestion-only.

Recommended default:

  • store every AgentResponse
  • display recommendations in the UI
  • require admin or hacker confirmation before creating scans, changing vulnerability status, or changing severity
  • prevent clients from applying any mutation action

Automatic execution can be added later per action type and per organization policy, but operational auto-execution is not part of the initial release.

Auditability

Every run should produce an agent_runs record with enough detail to reproduce the decision:

  • organization ID
  • scan ID or source resource ID
  • agent ID
  • agent category
  • provider
  • model
  • prompt template version
  • input context
  • validated output
  • status
  • error message when failed
  • timestamps

When an agent recommendation is applied by a user, the platform should store:

  • who applied it
  • source agent_run_id
  • action type
  • target resource
  • before and after values

Failure Handling

Agent failure must not fail the scan pipeline.

Expected behavior:

  • scan processing completes independently
  • failed agent runs are stored with status = failed
  • partial agent failures do not block other agents
  • provider timeouts are bounded
  • malformed LLM output is rejected and stored as a validation error

Prompt Injection and Untrusted Data

Scan output, page content, HTTP responses, vulnerability evidence, and asset metadata are untrusted inputs.

Agents must treat this data as evidence, not as instructions. Provider prompts should separate system instructions from scan data and tell the model to ignore instructions found inside target content.

Default Policy

Until a stronger policy engine exists, use these defaults:

  • admin can configure agents for all organizations
  • advisory agents can produce notes, summaries, risk context, and attack path explanations
  • operational agents can produce suggestions only
  • hacker users can review allowed suggestions inside assigned organizations
  • client users can view completed read-only summaries only when tied to assigned organizations
  • no agent action is auto-applied except non-mutating report notes if policy allows it
  • scans suggested by agents require admin or hacker approval