Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/gadievron/raptor/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The exploitability validation pipeline ensures security findings are not false positives by systematically verifying they are real, reachable, and exploitable through a 6-stage process.

Why Validation?

Static analysis tools produce many findings, but not all are exploitable:
  • Hallucinated findings: File doesn’t exist, code doesn’t match scanner output
  • Unreachable code: Dead code, test-only functions
  • Protected paths: Effective sanitization, impossible preconditions
  • Binary constraints: Mitigations that block exploitation
Validation prevents wasted effort on false positives.

The 6-Stage Pipeline

┌─────────────────────────────────────────────────────┐
│  Stage 0: Inventory                                 │
│  Build ground truth checklist of all functions     │
└─────────────┬───────────────────────────────────────┘
              │ checklist.json

┌─────────────────────────────────────────────────────┐
│  Stage A: One-Shot Analysis                         │
│  Quick exploitability check + PoC attempt           │
└─────────────┬───────────────────────────────────────┘
              │ findings.json (status: pending/not_disproven)

┌─────────────────────────────────────────────────────┐
│  Stage B: Process                                   │
│  Systematic analysis with attack trees              │
└─────────────┬───────────────────────────────────────┘
              │ 5 working documents

┌─────────────────────────────────────────────────────┐
│  Stage C: Sanity Check                              │
│  Validate against actual source code                │
└─────────────┬───────────────────────────────────────┘
              │ sanity_check added to findings

┌─────────────────────────────────────────────────────┐
│  Stage D: Ruling                                    │
│  Filter based on practical exploitation criteria   │
└─────────────┬───────────────────────────────────────┘
              │ ruling & confirmed findings

         ┌────┴────┐
         │         │
         ▼         ▼
    Memory      Web/Injection
    Corruption     (Done)


┌─────────────────────────────────────────────────────┐
│  Stage E: Feasibility                               │
│  Binary constraint analysis for memory corruption   │
└─────────────┬───────────────────────────────────────┘
              │ final_status & feasibility

          validation-report.md

Stage 0: Inventory

Purpose: Build a complete checklist of all code to be analyzed.

Output

checklist.json - Complete function inventory:
{
  "generated_at": "2026-03-04T12:00:00Z",
  "target_path": "/path/to/code",
  "total_files": 42,
  "total_functions": 256,
  "files": [
    {
      "path": "src/parser.c",
      "functions": [
        {
          "name": "parse_header",
          "line_start": 120,
          "line_end": 145,
          "checked": false
        }
      ]
    }
  ]
}

Execution

from packages.exploitability_validation.checklist_builder import build_checklist

checklist = build_checklist(
    target_path="/path/to/code",
    workdir=".out/validation/",
    exclude_patterns=["*_test.*", "test_*"]
)

Stage A: One-Shot Analysis

Purpose: Quick exploitability assessment with PoC attempts.

Gates Applied

  • GATE-1 [ASSUME-EXPLOIT]: Assume findings are exploitable until proven otherwise
  • GATE-4 [NO-HEDGING]: No “maybe” or “could be” - verify all claims
  • GATE-6 [PROOF]: Provide concrete proof and vulnerable code

Output

findings.json - Initial exploitability assessment:
{
  "stage": "A",
  "timestamp": "2026-03-04T12:30:00Z",
  "findings": [
    {
      "id": "FINDING-0001",
      "file": "src/parser.c",
      "line": 134,
      "function": "parse_header",
      "vuln_type": "buffer_overflow",
      "status": "not_disproven",
      "message": "Unbounded strcpy into fixed buffer",
      "proof": "strcpy(buf, header);",
      "poc_attempted": true,
      "poc_result": "crash with SIGSEGV"
    }
  ]
}

Status Values

  • poc_success - PoC successfully demonstrated vulnerability
  • not_disproven - Cannot rule out, needs deeper analysis (Stage B)
  • disproven - Proven safe, no further analysis needed

Stage B: Process

Purpose: Systematic analysis for “not_disproven” findings using attack trees and knowledge graphs.

Gates Applied

ALL gates (1-6):
  • GATE-1: Assume exploitable
  • GATE-2: Strictly follow instructions
  • GATE-3: Update checklist, collect evidence
  • GATE-4: No hedging
  • GATE-5: Full code coverage
  • GATE-6: Provide proof

Working Documents

Stage B creates 5 specialized documents:

1. attack-tree.json

Knowledge graph of attack paths:
{
  "root": "Exploit buffer overflow in parse_header",
  "updated_at": "2026-03-04T13:00:00Z",
  "nodes": [
    {
      "id": "node-001",
      "type": "goal",
      "description": "Control instruction pointer",
      "children": ["node-002", "node-003"],
      "status": "testing"
    },
    {
      "id": "node-002",
      "type": "method",
      "description": "Overwrite return address on stack",
      "prerequisites": ["Stack overflow possible", "No stack canary"],
      "status": "confirmed"
    }
  ]
}

2. hypotheses.json

Testable predictions:
[
  {
    "id": "hyp-001",
    "hypothesis": "Input length controls overflow distance",
    "status": "confirmed",
    "evidence": [
      "Input of 100 bytes overwrites RBP",
      "Input of 104 bytes overwrites return address"
    ],
    "tested_at": "2026-03-04T13:15:00Z"
  },
  {
    "id": "hyp-002",
    "hypothesis": "Stack canary blocks exploitation",
    "status": "disproven",
    "evidence": ["Binary compiled without -fstack-protector"],
    "tested_at": "2026-03-04T13:20:00Z"
  }
]

3. disproven.json

Failed approaches:
[
  {
    "approach": "ROP chain via libc gadgets",
    "why_failed": "ASLR randomizes libc base, no info leak available",
    "attempted_at": "2026-03-04T13:30:00Z",
    "learnings": "Need info leak primitive before ROP"
  }
]

4. attack-paths.json

Attempted exploitation paths with PROXIMITY scoring:
[
  {
    "path_id": "path-001",
    "description": "Direct return address overwrite",
    "steps": [
      "1. Send 104-byte input",
      "2. Overwrite return address with shellcode location",
      "3. Return from function to shellcode"
    ],
    "proximity": 8,
    "blockers": ["DEP prevents shellcode execution"],
    "status": "blocked"
  },
  {
    "path_id": "path-002",
    "description": "ROP chain to mprotect()",
    "steps": [
      "1. Leak stack address",
      "2. Build ROP chain calling mprotect()",
      "3. Make stack executable",
      "4. Jump to shellcode on stack"
    ],
    "proximity": 5,
    "blockers": ["No info leak primitive found"],
    "status": "investigating"
  }
]
PROXIMITY Scale:
  • 10 - Working exploit
  • 8-9 - Very close, minor obstacles
  • 6-7 - Feasible path, some blockers
  • 4-5 - Significant obstacles
  • 1-3 - Far from exploitation
  • 0 - Not viable

5. attack-surface.json

Sources, sinks, and trust boundaries:
{
  "sources": [
    {
      "type": "user_input",
      "location": "src/parser.c:100",
      "function": "read_header",
      "description": "HTTP header from socket",
      "controllable": true
    }
  ],
  "sinks": [
    {
      "type": "memory_operation",
      "location": "src/parser.c:134",
      "function": "parse_header",
      "operation": "strcpy",
      "dangerous": true
    }
  ],
  "trust_boundaries": [
    {
      "location": "src/parser.c:105",
      "type": "validation",
      "description": "Header length check",
      "effective": false,
      "reason": "Check uses signed comparison, negative values bypass"
    }
  ]
}

Stage C: Sanity Check

Purpose: Verify findings against actual source code.

Gates Applied

  • GATE-3 [CHECKLIST]: Update checklist with verification
  • GATE-5 [FULL-COVERAGE]: Check all code, no sampling
  • GATE-6 [PROOF]: Show actual code verbatim

Verification Checks

  1. File exists at stated path
  2. Code matches VERBATIM at stated line (not paraphrased)
  3. Source→sink flow is real (not hypothetical)
  4. Code is reachable (function is actually called)

Output

findings.json with sanity_check field added:
{
  "id": "FINDING-0001",
  "file": "src/parser.c",
  "line": 134,
  "sanity_check": {
    "passed": true,
    "file_exists": true,
    "code_matches": true,
    "code_verbatim": "    strcpy(buf, header);",
    "flow_real": true,
    "reachable": true,
    "verified_at": "2026-03-04T14:00:00Z"
  }
}

Stage D: Ruling

Purpose: Make final exploitability determination based on all evidence.

Gates Applied

  • GATE-3 [CHECKLIST]: Document ruling decisions
  • GATE-5 [FULL-COVERAGE]: Rule on all findings
  • GATE-6 [PROOF]: Justify ruling with evidence

Ruling Criteria

Findings are ruled_out if:
  • Failed sanity check
  • Requires impossible preconditions
  • Protected by effective mitigations
  • Attack paths have PROXIMITY ≤ 2
Findings are confirmed if:
  • Passed sanity check
  • Realistic exploitation path exists
  • No effective protections
  • Attack paths have PROXIMITY ≥ 6

Output

findings.json with ruling field:
{
  "id": "FINDING-0001",
  "ruling": {
    "status": "Confirmed",
    "reason": "Passed sanity check, direct exploitation path with proximity 8",
    "attack_path": "path-001",
    "prerequisites": [],
    "ruled_at": "2026-03-04T14:30:00Z"
  }
}

Status Values

  • Confirmed - Exploitable, proceed to Stage E
  • Ruled Out - Not exploitable, stop here

Stage E: Feasibility

Purpose: Binary constraint analysis for memory corruption vulnerabilities.
Scope: Stage E only applies to memory corruption types (buffer overflow, format string, UAF, etc.). Web/injection vulnerabilities stop at Stage D.

Memory Corruption Types

Stage E applies to:
  • buffer_overflow
  • heap_overflow
  • stack_overflow
  • format_string
  • use_after_free
  • double_free
  • integer_overflow
  • out_of_bounds_read
  • out_of_bounds_write

Binary Analysis

Integrates with packages/exploit_feasibility for:
  1. Protection detection: ASLR, DEP, RELRO, stack canaries
  2. Constraint analysis: Bad bytes, null terminators
  3. Gadget availability: ROP gadgets, syscall availability
  4. Verdict: Likely / Difficult / Unlikely

Execution

from packages.exploit_feasibility import analyze_binary

result = analyze_binary(
    binary_path="/path/to/binary",
    vuln_type="buffer_overflow"
)

print(f"Verdict: {result['verdict']}")
print(f"Blockers: {result['blockers']}")
print(f"Suggestions: {result['suggestions']}")

Output

findings.json with feasibility and final_status:
{
  "id": "FINDING-0001",
  "feasibility": {
    "status": "analyzed",
    "binary_path": "/path/to/binary",
    "verdict": "Difficult",
    "chain_breaks": [
      "ASLR randomizes code base",
      "DEP prevents shellcode execution"
    ],
    "what_would_help": [
      "Info leak to defeat ASLR",
      "ROP chain for code reuse"
    ]
  },
  "final_status": "Confirmed (constrained)"
}

Final Status Mapping

Ruling StatusFeasibility VerdictFinal Status
ConfirmedLikelyExploitable
ConfirmedDifficultConfirmed (constrained)
ConfirmedUnlikelyConfirmed (blocked)
ConfirmedN/A (web vuln)Confirmed
Ruled Out-Ruled Out

CLI Usage

Full Pipeline

Run complete validation from scratch:
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --vuln-type buffer_overflow

With Pre-existing Findings

Validate findings from scanner output (skips Stage 0 and A):
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings scan_results.sarif

With Binary for Stage E

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings findings.json \
  --binary /path/to/compiled/binary

Skip Stage E

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --skip-feasibility

Custom Working Directory

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --workdir /custom/output/path

Python API

Orchestrator

from packages.exploitability_validation import ValidationOrchestrator, PipelineConfig

config = PipelineConfig(
    target_path="/path/to/code",
    workdir=".out/validation-20260304/",
    vuln_type="command_injection",
    binary_path=None,
    findings_file=None,
    skip_feasibility=False
)

orchestrator = ValidationOrchestrator(config)
result = orchestrator.run()

print(f"Success: {result.state.completed_at}")
for stage, stage_result in result.state.stage_results.items():
    print(f"{stage.name}: {stage_result.status}")

Convenience Function

from packages.exploitability_validation import run_validation

result = run_validation(
    target_path="/path/to/code",
    vuln_type="sql_injection",
    findings_file="scanner_output.sarif"
)

SARIF Input Support

The validation pipeline automatically converts SARIF format:
# Supported: SARIF 2.0 and 2.1.0
# From tools: Semgrep, CodeQL, others

config = PipelineConfig(
    target_path="/path/to/code",
    findings_file="semgrep_results.sarif"  # Auto-detected format
)

SARIF Conversion

  • Rule ID normalization: engine.semgrep.rules.crypto.weak-hashweak_hash
  • CWE mapping: CWE-89sql_injection
  • Deduplication: By file:line:vuln_type
  • Logical locations: Extracts function names
  • Severity mapping: SARIF levels → internal severity

Validation Report

Final output: validation-report.md
# Exploitability Validation Report

## Summary
- Target: /path/to/code
- Vulnerability Type: buffer_overflow
- Started: 2026-03-04 12:00:00
- Completed: 2026-03-04 14:45:00

## Stage Results
- Stage 0 (Inventory): [OK] (12.3s)
- Stage A (One-Shot): [OK] (45.7s)
- Stage B (Process): [OK] (123.4s)
- Stage C (Sanity): [OK] (23.1s)
- Stage D (Ruling): [OK] (8.9s)
- Stage E (Feasibility): [OK] (15.2s)

## Findings Summary
- Total: 15
- Exploitable: 2
- Confirmed (constrained): 3
- Confirmed (blocked): 1
- Ruled Out: 9

## Confirmed Findings

### FINDING-0001: buffer_overflow in src/parser.c:134
- Function: parse_header
- Final Status: Exploitable
- Feasibility: Likely
- Chain Breaks: None

### FINDING-0003: format_string in src/logger.c:89
- Function: log_message
- Final Status: Confirmed (constrained)
- Feasibility: Difficult
- Chain Breaks: RELRO blocks GOT overwrite, PIE randomizes addresses

Output Style Guide

Per RAPTOR’s style conventions:

Human-Readable Status

  • Exploitable (not EXPLOITABLE)
  • Confirmed (not CONFIRMED)
  • Ruled Out (not RULED_OUT)
  • Proven / Disproven (not PROVEN / DISPROVEN)

No Colored Indicators

  • ❌ Don’t use: 🔴/🟢 (perspective-dependent)
  • ✅ Use: Plain text or ### Exploitable (7 findings)
  • ✅ Other emojis OK: ⚠️, ✓, etc.

Best Practices

Start with SARIF input: Feed scanner output directly to validation to avoid manual finding transcription. The pipeline auto-converts and deduplicates.
Stage B is intensive: For large codebases with many “not_disproven” findings, Stage B can take hours. Consider filtering to high-severity findings first.
Stage E requires binary: If no compiled binary is available, Stage E is skipped. Memory corruption findings will be marked Confirmed without feasibility analysis.

Troubleshooting

Stage A produces all “not_disproven”

This is normal for complex vulnerabilities. Stage B will analyze them systematically.

Stage C sanity checks fail

Common causes:
  • Scanner output has stale file paths
  • Code changed since scanning
  • Scanner hallucinated the finding
Fix: Re-run scanner on current codebase.

Stage E skipped unexpectedly

Check:
  • Binary path is correct: --binary /path/to/binary
  • Binary is executable: chmod +x /path/to/binary
  • Vulnerability type is memory corruption

Integration Examples

From Semgrep

# 1. Run Semgrep
python3 packages/static-analysis/scanner.py \
  --repo /path/to/code \
  --policy_groups all

# 2. Validate findings
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings out/scan_*/combined.sarif

From CodeQL

# 1. Run CodeQL
python3 raptor_codeql.py \
  --repo /path/to/code \
  --scan-only

# 2. Validate findings
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings out/codeql_*/java_results.sarif \
  --binary /path/to/binary.jar

From Autonomous Mode

Validation runs automatically in /agentic:
/agentic /path/to/code
# Automatically runs:
# 1. Static analysis (Semgrep/CodeQL)
# 2. Exploitability validation (this pipeline)
# 3. LLM analysis
# 4. Exploit generation

See Also