Replay Validation and Calibration Engine¶
Phase 6 adds a replay validation and calibration engine for testing warning, surveillance, and risk models against known scenarios.
The replay engine is standalone. It does not require a MongoDB connection. Scenarios are replayed through the deterministic rule engines to validate outputs, compare against quiet-day baselines, analyze signal decay, decompose confidence, and generate heatmaps.
Architecture¶
app/replay/
__init__.py - Public API exports
scenarios.py - ReplayScenario dataclass, built-in scenarios
runner.py - ReplayRunner orchestrates scenario execution
decay.py - SignalDecayModel, type-specific decay parameters
confidence.py - ConfidenceDecomposition with source-weight breakdown
quiet_day.py - QuietDayBaseline for comparison
heatmap.py - HeatmapGenerator for coordinate-grid score maps
Components¶
Replay Scenarios¶
Each scenario captures a known location, timestamp, and environmental conditions. The runner feeds these through the existing calculate_warning, score_surveillance_zones, and risk engines.
Six built-in scenarios:
| Scenario | Region | Context |
|---|---|---|
| florida_summer_heavy_rain | Florida | Heavy summer rainfall, swimming |
| hawaii_sharktober_quiet | Hawaii | October quiet conditions, surfing |
| wa_spearfishing_reef_white | Western Australia | Spearfishing on reef, white shark suitability |
| south_africa_white_shark_surf_seal_colony | South Africa | White shark / seal colony / surf context |
| red_sea_oceanic_whitetip_feeding | Red Sea | Whale carcass feeding event |
Scenario packs are region-organized under app/replay/datasets/.
Signal Decay¶
SignalDecayModel applies exponential half-life decay to signal values:
weather_rainfall: 6h half-life, 24h expirysighting/shark_sighting: 12h half-life, 36h expirycarcass/whale_carcass: 72h half-life, 144h expiryocean_sst: 24h half-life, 72h expirybiological_event: 72h half-life, 144h expirysst_anomaly: 48h half-life, 120h expiryvessel_activity: 12h half-life, 36h expiry
Decay weight = 2^(-age_hours / half_life_hours). Signals beyond half_life * expiry_multiplier hours receive weight 0.
Confidence Decomposition¶
Confidence is decomposed into three weighted components:
- coverage_confidence (40%): what fraction of expected sources are present
- freshness_confidence (30%): penalty for stale sources
- completeness_confidence (30%): penalty for missing sources
Source weights are defined per data source (weather_observations, ocean_observations, vessel_activity, biological_events, etc.).
Quiet-Day Comparison¶
The QuietDayBaseline defines a standard quiet-day input set (no rainfall, far from river mouth, moderate SST, minimal vessel/biological signal) and compares current replay output against it. The comparison reports delta, percent change, band change, and a text interpretation.
Heatmap Generation¶
HeatmapGenerator produces coordinate grids centered on a location. Each grid cell is scored by the warning or risk engine, producing:
cells: list of {lat, lon, score, band}statistics: min, max, avg, median scoreconfig: center, radius, cell count
API Endpoints¶
| Endpoint | Description |
|---|---|
GET /api/v1/replay/scenarios |
List built-in replay scenarios |
GET /api/v1/replay/run |
Run a built-in scenario by scenario_id query parameter |
GET /api/v1/replay/compare |
Compare a scenario to quiet-day baseline by scenario_id query parameter |
POST /api/v1/replay/run |
Run a custom scenario |
GET /api/v1/replay/run-all |
Run all built-in scenarios |
GET /api/v1/replay/decay-analysis/{scenario_id} |
Signal decay analysis for a scenario |
GET /api/v1/replay/heatmap |
Generate surveillance-priority heatmap |
GET /api/v1/replay/run/{scenario_id} |
Compatibility helper for built-in scenario replay |
GET /api/v1/replay/compare-quiet-day/{scenario_id} |
Compare scenario to quiet-day baseline |
CLI Usage¶
Replay scenarios can also be run directly without the API:
from app.replay.scenarios import REPLAY_SCENARIOS
from app.replay.runner import ReplayRunner
runner = ReplayRunner()
result = runner.run_scenario(REPLAY_SCENARIOS["florida_summer_heavy_rain"])
print(result.warning["warning_score"], result.warning["warning_band"])
Testing¶
Run replay-specific tests:
Limitations¶
- Scenarios are deterministic and do not include real-time provider data
- Heatmap generation recalculates scores per cell; large grids may be slow
- Quiet-day baseline uses fixed moderate inputs, not historical averages
- Signal decay is exponential; does not account for intermittent refresh patterns
Versioned Replay Explanations¶
Replay validation runs deterministic historical incident and quiet-day comparisons. Replay outputs now include model/replay version metadata so case-study artifacts can be traced to the scoring revision that generated them.
Replay explanation payloads include:
model_versionscoring_revisionprovider_stack_versiongenerated_atreplay_asset_version
Replay explanations should preserve the AI1SAD split between:
- environmental/live-condition
warning_score - human-context
activity_hazard_score - operational
surveillance_priority_score
They must not describe replay output as attack probability.