Skip to content

Agent Implementation Roadmap

This roadmap describes the LLM-first implementation path for AI agents in the EASM Platform. It is based on the current repository state: the frontend settings page exists, agent_runs exists in the database schema, and backend execution is not yet connected.

The implementation should start with LLM-backed Advisory Agents. Operational agents should remain recommendation-only until an approval workflow exists.

Phase 1: LLM-Backed Advisory Agents

Build the initial backend contracts and execute advisory agents through an LLM provider.

Deliverables:

  • backend/internal/agents
  • AgentCategory with advisory and operational
  • AgentContext
  • AgentResponse
  • prompt builder
  • prompt templates for advisory agents
  • structured response validation
  • provider abstraction

Initial agents:

  • Risk Agent
  • Executive Summary Agent
  • Attack Path Agent

Acceptance criteria:

  • advisory agents receive structured platform context
  • context is rendered through prompt templates
  • model output is parsed as JSON
  • invalid model output is rejected
  • advisory agents do not create scans or mutate scope

Phase 2: Agent Execution History

Persist and expose agent execution history.

Deliverables:

  • agent_runs repository
  • run status tracking
  • stored input context
  • stored validated output
  • stored provider/model metadata
  • failed run error details
  • organization-scoped run listing

Acceptance criteria:

  • every run is auditable
  • failed provider calls are visible
  • users only see runs for organizations they can access
  • scan success is not blocked by agent failure

Phase 3: Operational Recommendation Workflow

Add operational agents as suggestion-only modules.

Agents:

  • Recon Agent
  • Pentest Agent

Expected outputs:

  • suggest_scan
  • flag_asset
  • suggest_profile
  • suggest_plugin_run
  • validation steps
  • investigation notes

Acceptance criteria:

  • operational agents do not auto-run scans
  • all scan-related actions include requires_approval = true
  • suggested targets are validated against approved scope
  • suggestions are stored in agent_runs

Phase 4: Approval System

Add workflow for approving or rejecting operational recommendations.

Recommended endpoints:

POST /api/v1/agents/runs/{id}/actions/{action_id}/approve
POST /api/v1/agents/runs/{id}/actions/{action_id}/reject

Acceptance criteria:

  • clients cannot approve mutation actions
  • hackers can approve allowed operational actions in assigned organizations
  • admins can approve all actions
  • applied actions reference the original agent_run_id
  • audit history records before and after values
  • operational auto-execution remains disabled by default

Phase 5: Multi-Provider Support

Add interchangeable hosted LLM providers.

Providers:

  • Claude
  • OpenAI
  • Gemini

Common interface:

type LLMProvider interface {
    Generate(ctx context.Context, prompt string) (string, error)
}

Acceptance criteria:

  • agents do not depend on a specific provider
  • provider timeout is configurable
  • malformed JSON output is rejected
  • provider errors are stored in agent_runs
  • provider prompts never include secrets

Phase 6: Local Model Support

Add local model support through Ollama.

Acceptance criteria:

  • local provider uses the same LLMProvider interface
  • local model output uses the same AgentResponse schema
  • local model failures are handled like hosted provider failures
  • deployments can disable hosted providers entirely

Phase 7: Multi-Agent Collaboration

Add controlled collaboration between agents after individual agents are stable.

Examples:

  • Risk Agent uses Attack Path Agent output as additional context
  • Executive Summary Agent summarizes Risk and Attack Path results
  • Operational agents use advisory summaries to prioritize suggestions

Acceptance criteria:

  • collaboration uses stored structured outputs, not hidden model state
  • each agent run remains separately auditable
  • circular agent dependencies are prevented
  • cross-organization context is never mixed

Non-Goals for Initial Release

Do not include these in the first backend implementation:

  • autonomous exploitation
  • operational auto-execution
  • automatic scope expansion
  • automatic scan launch from LLM output
  • cross-organization correlation
  • provider key storage in frontend state as production source of truth
  • unvalidated free-form model output

Verification Checklist

Before enabling agents in production, verify:

  • disabled agents do not run
  • prompts contain only authorized organization data
  • advisory agents cannot create scans or mutate scope
  • operational agents remain suggestion-only
  • operational actions require approval
  • client users cannot trigger mutation actions
  • hacker users cannot access unassigned organization agent runs
  • admin users can view all agent runs
  • failed provider calls do not fail scans
  • invalid JSON output is rejected
  • agent_runs stores context, output, status, provider, model, and errors