Skip to content

Replay Validation and Calibration Engine

Phase 6 adds a replay validation and calibration engine for testing warning, surveillance, and risk models against known scenarios.

The replay engine is standalone. It does not require a MongoDB connection. Scenarios are replayed through the deterministic rule engines to validate outputs, compare against quiet-day baselines, analyze signal decay, decompose confidence, and generate heatmaps.

Architecture

app/replay/
  __init__.py         - Public API exports
  scenarios.py        - ReplayScenario dataclass, built-in scenarios
  runner.py           - ReplayRunner orchestrates scenario execution
  decay.py            - SignalDecayModel, type-specific decay parameters
  confidence.py       - ConfidenceDecomposition with source-weight breakdown
  quiet_day.py        - QuietDayBaseline for comparison
  heatmap.py          - HeatmapGenerator for coordinate-grid score maps

Components

Replay Scenarios

Each scenario captures a known location, timestamp, and environmental conditions. The runner feeds these through the existing calculate_warning, score_surveillance_zones, and risk engines.

Six built-in scenarios:

Scenario Region Context
florida_summer_heavy_rain Florida Heavy summer rainfall, swimming
hawaii_sharktober_quiet Hawaii October quiet conditions, surfing
wa_spearfishing_reef_white Western Australia Spearfishing on reef, white shark suitability
south_africa_white_shark_surf_seal_colony South Africa White shark / seal colony / surf context
red_sea_oceanic_whitetip_feeding Red Sea Whale carcass feeding event

Scenario packs are region-organized under app/replay/datasets/.

Signal Decay

SignalDecayModel applies exponential half-life decay to signal values:

  • weather_rainfall: 6h half-life, 24h expiry
  • sighting / shark_sighting: 12h half-life, 36h expiry
  • carcass / whale_carcass: 72h half-life, 144h expiry
  • ocean_sst: 24h half-life, 72h expiry
  • biological_event: 72h half-life, 144h expiry
  • sst_anomaly: 48h half-life, 120h expiry
  • vessel_activity: 12h half-life, 36h expiry

Decay weight = 2^(-age_hours / half_life_hours). Signals beyond half_life * expiry_multiplier hours receive weight 0.

Confidence Decomposition

Confidence is decomposed into three weighted components:

  • coverage_confidence (40%): what fraction of expected sources are present
  • freshness_confidence (30%): penalty for stale sources
  • completeness_confidence (30%): penalty for missing sources

Source weights are defined per data source (weather_observations, ocean_observations, vessel_activity, biological_events, etc.).

Quiet-Day Comparison

The QuietDayBaseline defines a standard quiet-day input set (no rainfall, far from river mouth, moderate SST, minimal vessel/biological signal) and compares current replay output against it. The comparison reports delta, percent change, band change, and a text interpretation.

Heatmap Generation

HeatmapGenerator produces coordinate grids centered on a location. Each grid cell is scored by the warning or risk engine, producing:

  • cells: list of {lat, lon, score, band}
  • statistics: min, max, avg, median score
  • config: center, radius, cell count

API Endpoints

Endpoint Description
GET /api/v1/replay/scenarios List built-in replay scenarios
GET /api/v1/replay/run Run a built-in scenario by scenario_id query parameter
GET /api/v1/replay/compare Compare a scenario to quiet-day baseline by scenario_id query parameter
POST /api/v1/replay/run Run a custom scenario
GET /api/v1/replay/run-all Run all built-in scenarios
GET /api/v1/replay/decay-analysis/{scenario_id} Signal decay analysis for a scenario
GET /api/v1/replay/heatmap Generate surveillance-priority heatmap
GET /api/v1/replay/run/{scenario_id} Compatibility helper for built-in scenario replay
GET /api/v1/replay/compare-quiet-day/{scenario_id} Compare scenario to quiet-day baseline

CLI Usage

Replay scenarios can also be run directly without the API:

from app.replay.scenarios import REPLAY_SCENARIOS
from app.replay.runner import ReplayRunner

runner = ReplayRunner()
result = runner.run_scenario(REPLAY_SCENARIOS["florida_summer_heavy_rain"])
print(result.warning["warning_score"], result.warning["warning_band"])

Testing

Run replay-specific tests:

python -m pytest tests/test_replay_engine.py -v

Limitations

  • Scenarios are deterministic and do not include real-time provider data
  • Heatmap generation recalculates scores per cell; large grids may be slow
  • Quiet-day baseline uses fixed moderate inputs, not historical averages
  • Signal decay is exponential; does not account for intermittent refresh patterns

Versioned Replay Explanations

Replay validation runs deterministic historical incident and quiet-day comparisons. Replay outputs now include model/replay version metadata so case-study artifacts can be traced to the scoring revision that generated them.

Replay explanation payloads include:

  • model_version
  • scoring_revision
  • provider_stack_version
  • generated_at
  • replay_asset_version

Replay explanations should preserve the AI1SAD split between:

  • environmental/live-condition warning_score
  • human-context activity_hazard_score
  • operational surveillance_priority_score

They must not describe replay output as attack probability.