Exploitability Validation

Overview

The exploitability validation pipeline ensures security findings are not false positives by systematically verifying they are real, reachable, and exploitable through a 6-stage process.

Why Validation?

Static analysis tools produce many findings, but not all are exploitable:

Hallucinated findings: File doesn’t exist, code doesn’t match scanner output
Unreachable code: Dead code, test-only functions
Protected paths: Effective sanitization, impossible preconditions
Binary constraints: Mitigations that block exploitation

Validation prevents wasted effort on false positives.

The 6-Stage Pipeline

┌─────────────────────────────────────────────────────┐
│  Stage 0: Inventory                                 │
│  Build ground truth checklist of all functions     │
└─────────────┬───────────────────────────────────────┘
              │ checklist.json
              ▼
┌─────────────────────────────────────────────────────┐
│  Stage A: One-Shot Analysis                         │
│  Quick exploitability check + PoC attempt           │
└─────────────┬───────────────────────────────────────┘
              │ findings.json (status: pending/not_disproven)
              ▼
┌─────────────────────────────────────────────────────┐
│  Stage B: Process                                   │
│  Systematic analysis with attack trees              │
└─────────────┬───────────────────────────────────────┘
              │ 5 working documents
              ▼
┌─────────────────────────────────────────────────────┐
│  Stage C: Sanity Check                              │
│  Validate against actual source code                │
└─────────────┬───────────────────────────────────────┘
              │ sanity_check added to findings
              ▼
┌─────────────────────────────────────────────────────┐
│  Stage D: Ruling                                    │
│  Filter based on practical exploitation criteria   │
└─────────────┬───────────────────────────────────────┘
              │ ruling & confirmed findings
              ▼
         ┌────┴────┐
         │         │
         ▼         ▼
    Memory      Web/Injection
    Corruption     (Done)
         │
         ▼
┌─────────────────────────────────────────────────────┐
│  Stage E: Feasibility                               │
│  Binary constraint analysis for memory corruption   │
└─────────────┬───────────────────────────────────────┘
              │ final_status & feasibility
              ▼
          validation-report.md

Stage 0: Inventory

Purpose: Build a complete checklist of all code to be analyzed.

Output

checklist.json - Complete function inventory:

{
  "generated_at": "2026-03-04T12:00:00Z",
  "target_path": "/path/to/code",
  "total_files": 42,
  "total_functions": 256,
  "files": [
    {
      "path": "src/parser.c",
      "functions": [
        {
          "name": "parse_header",
          "line_start": 120,
          "line_end": 145,
          "checked": false
        }
      ]
    }
  ]
}

Execution

from packages.exploitability_validation.checklist_builder import build_checklist

checklist = build_checklist(
    target_path="/path/to/code",
    workdir=".out/validation/",
    exclude_patterns=["*_test.*", "test_*"]
)

Stage A: One-Shot Analysis

Purpose: Quick exploitability assessment with PoC attempts.

Gates Applied

GATE-1 [ASSUME-EXPLOIT]: Assume findings are exploitable until proven otherwise
GATE-4 [NO-HEDGING]: No “maybe” or “could be” - verify all claims
GATE-6 [PROOF]: Provide concrete proof and vulnerable code

Output

findings.json - Initial exploitability assessment:

{
  "stage": "A",
  "timestamp": "2026-03-04T12:30:00Z",
  "findings": [
    {
      "id": "FINDING-0001",
      "file": "src/parser.c",
      "line": 134,
      "function": "parse_header",
      "vuln_type": "buffer_overflow",
      "status": "not_disproven",
      "message": "Unbounded strcpy into fixed buffer",
      "proof": "strcpy(buf, header);",
      "poc_attempted": true,
      "poc_result": "crash with SIGSEGV"
    }
  ]
}

Status Values

poc_success - PoC successfully demonstrated vulnerability
not_disproven - Cannot rule out, needs deeper analysis (Stage B)
disproven - Proven safe, no further analysis needed

Stage B: Process

Purpose: Systematic analysis for “not_disproven” findings using attack trees and knowledge graphs.

Gates Applied

ALL gates (1-6):

GATE-1: Assume exploitable
GATE-2: Strictly follow instructions
GATE-3: Update checklist, collect evidence
GATE-4: No hedging
GATE-5: Full code coverage
GATE-6: Provide proof

Working Documents

Stage B creates 5 specialized documents:

1. attack-tree.json

Knowledge graph of attack paths:

{
  "root": "Exploit buffer overflow in parse_header",
  "updated_at": "2026-03-04T13:00:00Z",
  "nodes": [
    {
      "id": "node-001",
      "type": "goal",
      "description": "Control instruction pointer",
      "children": ["node-002", "node-003"],
      "status": "testing"
    },
    {
      "id": "node-002",
      "type": "method",
      "description": "Overwrite return address on stack",
      "prerequisites": ["Stack overflow possible", "No stack canary"],
      "status": "confirmed"
    }
  ]
}

2. hypotheses.json

Testable predictions:

[
  {
    "id": "hyp-001",
    "hypothesis": "Input length controls overflow distance",
    "status": "confirmed",
    "evidence": [
      "Input of 100 bytes overwrites RBP",
      "Input of 104 bytes overwrites return address"
    ],
    "tested_at": "2026-03-04T13:15:00Z"
  },
  {
    "id": "hyp-002",
    "hypothesis": "Stack canary blocks exploitation",
    "status": "disproven",
    "evidence": ["Binary compiled without -fstack-protector"],
    "tested_at": "2026-03-04T13:20:00Z"
  }
]

3. disproven.json

Failed approaches:

[
  {
    "approach": "ROP chain via libc gadgets",
    "why_failed": "ASLR randomizes libc base, no info leak available",
    "attempted_at": "2026-03-04T13:30:00Z",
    "learnings": "Need info leak primitive before ROP"
  }
]

4. attack-paths.json

Attempted exploitation paths with PROXIMITY scoring:

[
  {
    "path_id": "path-001",
    "description": "Direct return address overwrite",
    "steps": [
      "1. Send 104-byte input",
      "2. Overwrite return address with shellcode location",
      "3. Return from function to shellcode"
    ],
    "proximity": 8,
    "blockers": ["DEP prevents shellcode execution"],
    "status": "blocked"
  },
  {
    "path_id": "path-002",
    "description": "ROP chain to mprotect()",
    "steps": [
      "1. Leak stack address",
      "2. Build ROP chain calling mprotect()",
      "3. Make stack executable",
      "4. Jump to shellcode on stack"
    ],
    "proximity": 5,
    "blockers": ["No info leak primitive found"],
    "status": "investigating"
  }
]

PROXIMITY Scale:

10 - Working exploit
8-9 - Very close, minor obstacles
6-7 - Feasible path, some blockers
4-5 - Significant obstacles
1-3 - Far from exploitation
0 - Not viable

5. attack-surface.json

Sources, sinks, and trust boundaries:

{
  "sources": [
    {
      "type": "user_input",
      "location": "src/parser.c:100",
      "function": "read_header",
      "description": "HTTP header from socket",
      "controllable": true
    }
  ],
  "sinks": [
    {
      "type": "memory_operation",
      "location": "src/parser.c:134",
      "function": "parse_header",
      "operation": "strcpy",
      "dangerous": true
    }
  ],
  "trust_boundaries": [
    {
      "location": "src/parser.c:105",
      "type": "validation",
      "description": "Header length check",
      "effective": false,
      "reason": "Check uses signed comparison, negative values bypass"
    }
  ]
}

Stage C: Sanity Check

Purpose: Verify findings against actual source code.

Gates Applied

GATE-3 [CHECKLIST]: Update checklist with verification
GATE-5 [FULL-COVERAGE]: Check all code, no sampling
GATE-6 [PROOF]: Show actual code verbatim

Verification Checks

File exists at stated path
Code matches VERBATIM at stated line (not paraphrased)
Source→sink flow is real (not hypothetical)
Code is reachable (function is actually called)

Output

findings.json with sanity_check field added:

{
  "id": "FINDING-0001",
  "file": "src/parser.c",
  "line": 134,
  "sanity_check": {
    "passed": true,
    "file_exists": true,
    "code_matches": true,
    "code_verbatim": "    strcpy(buf, header);",
    "flow_real": true,
    "reachable": true,
    "verified_at": "2026-03-04T14:00:00Z"
  }
}

Stage D: Ruling

Purpose: Make final exploitability determination based on all evidence.

Gates Applied

GATE-3 [CHECKLIST]: Document ruling decisions
GATE-5 [FULL-COVERAGE]: Rule on all findings
GATE-6 [PROOF]: Justify ruling with evidence

Ruling Criteria

Findings are ruled_out if:

Failed sanity check
Requires impossible preconditions
Protected by effective mitigations
Attack paths have PROXIMITY ≤ 2

Findings are confirmed if:

Passed sanity check
Realistic exploitation path exists
No effective protections
Attack paths have PROXIMITY ≥ 6

Output

findings.json with ruling field:

{
  "id": "FINDING-0001",
  "ruling": {
    "status": "Confirmed",
    "reason": "Passed sanity check, direct exploitation path with proximity 8",
    "attack_path": "path-001",
    "prerequisites": [],
    "ruled_at": "2026-03-04T14:30:00Z"
  }
}

Status Values

Confirmed - Exploitable, proceed to Stage E
Ruled Out - Not exploitable, stop here

Stage E: Feasibility

Purpose: Binary constraint analysis for memory corruption vulnerabilities.

Scope: Stage E only applies to memory corruption types (buffer overflow, format string, UAF, etc.). Web/injection vulnerabilities stop at Stage D.

Memory Corruption Types

Stage E applies to:

buffer_overflow
heap_overflow
stack_overflow
format_string
use_after_free
double_free
integer_overflow
out_of_bounds_read
out_of_bounds_write

Binary Analysis

Integrates with packages/exploit_feasibility for:

Protection detection: ASLR, DEP, RELRO, stack canaries
Constraint analysis: Bad bytes, null terminators
Gadget availability: ROP gadgets, syscall availability
Verdict: Likely / Difficult / Unlikely

Execution

from packages.exploit_feasibility import analyze_binary

result = analyze_binary(
    binary_path="/path/to/binary",
    vuln_type="buffer_overflow"
)

print(f"Verdict: {result['verdict']}")
print(f"Blockers: {result['blockers']}")
print(f"Suggestions: {result['suggestions']}")

Output

findings.json with feasibility and final_status:

{
  "id": "FINDING-0001",
  "feasibility": {
    "status": "analyzed",
    "binary_path": "/path/to/binary",
    "verdict": "Difficult",
    "chain_breaks": [
      "ASLR randomizes code base",
      "DEP prevents shellcode execution"
    ],
    "what_would_help": [
      "Info leak to defeat ASLR",
      "ROP chain for code reuse"
    ]
  },
  "final_status": "Confirmed (constrained)"
}

Final Status Mapping

Ruling Status	Feasibility Verdict	Final Status
Confirmed	Likely	Exploitable
Confirmed	Difficult	Confirmed (constrained)
Confirmed	Unlikely	Confirmed (blocked)
Confirmed	N/A (web vuln)	Confirmed
Ruled Out	-	Ruled Out

CLI Usage

Full Pipeline

Run complete validation from scratch:

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --vuln-type buffer_overflow

With Pre-existing Findings

Validate findings from scanner output (skips Stage 0 and A):

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings scan_results.sarif

With Binary for Stage E

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings findings.json \
  --binary /path/to/compiled/binary

Skip Stage E

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --skip-feasibility

Custom Working Directory

python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --workdir /custom/output/path

Python API

Orchestrator

from packages.exploitability_validation import ValidationOrchestrator, PipelineConfig

config = PipelineConfig(
    target_path="/path/to/code",
    workdir=".out/validation-20260304/",
    vuln_type="command_injection",
    binary_path=None,
    findings_file=None,
    skip_feasibility=False
)

orchestrator = ValidationOrchestrator(config)
result = orchestrator.run()

print(f"Success: {result.state.completed_at}")
for stage, stage_result in result.state.stage_results.items():
    print(f"{stage.name}: {stage_result.status}")

Convenience Function

from packages.exploitability_validation import run_validation

result = run_validation(
    target_path="/path/to/code",
    vuln_type="sql_injection",
    findings_file="scanner_output.sarif"
)

SARIF Input Support

The validation pipeline automatically converts SARIF format:

# Supported: SARIF 2.0 and 2.1.0
# From tools: Semgrep, CodeQL, others

config = PipelineConfig(
    target_path="/path/to/code",
    findings_file="semgrep_results.sarif"  # Auto-detected format
)

SARIF Conversion

Rule ID normalization: engine.semgrep.rules.crypto.weak-hash → weak_hash
CWE mapping: CWE-89 → sql_injection
Deduplication: By file:line:vuln_type
Logical locations: Extracts function names
Severity mapping: SARIF levels → internal severity

Validation Report

Final output: validation-report.md

# Exploitability Validation Report

## Summary
- Target: /path/to/code
- Vulnerability Type: buffer_overflow
- Started: 2026-03-04 12:00:00
- Completed: 2026-03-04 14:45:00

## Stage Results
- Stage 0 (Inventory): [OK] (12.3s)
- Stage A (One-Shot): [OK] (45.7s)
- Stage B (Process): [OK] (123.4s)
- Stage C (Sanity): [OK] (23.1s)
- Stage D (Ruling): [OK] (8.9s)
- Stage E (Feasibility): [OK] (15.2s)

## Findings Summary
- Total: 15
- Exploitable: 2
- Confirmed (constrained): 3
- Confirmed (blocked): 1
- Ruled Out: 9

## Confirmed Findings

### FINDING-0001: buffer_overflow in src/parser.c:134
- Function: parse_header
- Final Status: Exploitable
- Feasibility: Likely
- Chain Breaks: None

### FINDING-0003: format_string in src/logger.c:89
- Function: log_message
- Final Status: Confirmed (constrained)
- Feasibility: Difficult
- Chain Breaks: RELRO blocks GOT overwrite, PIE randomizes addresses

Output Style Guide

Per RAPTOR’s style conventions:

Human-Readable Status

✅ Exploitable (not EXPLOITABLE)
✅ Confirmed (not CONFIRMED)
✅ Ruled Out (not RULED_OUT)
✅ Proven / Disproven (not PROVEN / DISPROVEN)

No Colored Indicators

❌ Don’t use: 🔴/🟢 (perspective-dependent)
✅ Use: Plain text or ### Exploitable (7 findings)
✅ Other emojis OK: ⚠️, ✓, etc.

Best Practices

Start with SARIF input: Feed scanner output directly to validation to avoid manual finding transcription. The pipeline auto-converts and deduplicates.

Stage B is intensive: For large codebases with many “not_disproven” findings, Stage B can take hours. Consider filtering to high-severity findings first.

Stage E requires binary: If no compiled binary is available, Stage E is skipped. Memory corruption findings will be marked Confirmed without feasibility analysis.

Troubleshooting

Stage A produces all “not_disproven”

This is normal for complex vulnerabilities. Stage B will analyze them systematically.

Stage C sanity checks fail

Common causes:

Scanner output has stale file paths
Code changed since scanning
Scanner hallucinated the finding

Fix: Re-run scanner on current codebase.

Stage E skipped unexpectedly

Check:

Binary path is correct: --binary /path/to/binary
Binary is executable: chmod +x /path/to/binary
Vulnerability type is memory corruption

Integration Examples

From Semgrep

# 1. Run Semgrep
python3 packages/static-analysis/scanner.py \
  --repo /path/to/code \
  --policy_groups all

# 2. Validate findings
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings out/scan_*/combined.sarif

From CodeQL

# 1. Run CodeQL
python3 raptor_codeql.py \
  --repo /path/to/code \
  --scan-only

# 2. Validate findings
python3 -m packages.exploitability_validation \
  --target /path/to/code \
  --findings out/codeql_*/java_results.sarif \
  --binary /path/to/binary.jar

From Autonomous Mode

Validation runs automatically in /agentic:

/agentic /path/to/code
# Automatically runs:
# 1. Static analysis (Semgrep/CodeQL)
# 2. Exploitability validation (this pipeline)
# 3. LLM analysis
# 4. Exploit generation

Documentation Index

​Overview

​Why Validation?

​The 6-Stage Pipeline

​Stage 0: Inventory

​Output

​Execution

​Stage A: One-Shot Analysis

​Gates Applied

​Output

​Status Values

​Stage B: Process

​Gates Applied

​Working Documents

​1. attack-tree.json

​2. hypotheses.json

​3. disproven.json

​4. attack-paths.json

​5. attack-surface.json

​Stage C: Sanity Check

​Gates Applied

​Verification Checks

​Output

​Stage D: Ruling

​Gates Applied

​Ruling Criteria

​Output

​Status Values

​Stage E: Feasibility

​Memory Corruption Types

​Binary Analysis

​Execution

​Output

​Final Status Mapping

​CLI Usage

​Full Pipeline

​With Pre-existing Findings

​With Binary for Stage E

​Skip Stage E

​Custom Working Directory

​Python API

​Orchestrator

​Convenience Function

​SARIF Input Support

​SARIF Conversion

​Validation Report

​Output Style Guide

​Human-Readable Status

​No Colored Indicators

​Best Practices

​Troubleshooting

​Stage A produces all “not_disproven”

​Stage C sanity checks fail

​Stage E skipped unexpectedly

​Integration Examples

​From Semgrep

​From CodeQL

​From Autonomous Mode

​See Also

Overview

Why Validation?

The 6-Stage Pipeline

Stage 0: Inventory

Output

Execution

Stage A: One-Shot Analysis

Gates Applied

Output

Status Values

Stage B: Process

Gates Applied

Working Documents

1. attack-tree.json

2. hypotheses.json

3. disproven.json

4. attack-paths.json

5. attack-surface.json

Stage C: Sanity Check

Gates Applied

Verification Checks

Output

Stage D: Ruling

Gates Applied

Ruling Criteria

Output

Status Values

Stage E: Feasibility

Memory Corruption Types

Binary Analysis

Execution

Output

Final Status Mapping

CLI Usage

Full Pipeline

With Pre-existing Findings

With Binary for Stage E

Skip Stage E

Custom Working Directory

Python API

Orchestrator

Convenience Function

SARIF Input Support

SARIF Conversion

Validation Report

Output Style Guide

Human-Readable Status

No Colored Indicators

Best Practices

Troubleshooting

Stage A produces all “not_disproven”

Stage C sanity checks fail

Stage E skipped unexpectedly

Integration Examples

From Semgrep

From CodeQL

From Autonomous Mode

See Also