PATCH/VERDICT
V26A2
Back to home
WALK-FORWARD BACKTEST · HISTORICAL SCORECARD

Prediction Accuracy Report

For every act, the model is retrained from scratch using only data available at that point, then asked to predict that act. No future information leaks into past evaluation — only predictions the model could realistically have made in real time are scored.

TL;DR
Direction hit rate
61%
Share of 453 predictions where nerf/buff/stable direction was correct.
Random baseline 33% · Always-stable baseline 55%. Lift: +28pp / +6pp.
High-conf nerf precision
51%
Of predictions made at p_nerf ≥ 0.60, how many turned into actual nerfs.
Evaluation coverage
18 ACT
E6A2 → V26A1 · 453 predictions total.
453 prediction samplesRange: E6A2 → V26A118 act foldsMethod: walk-forward
OPERATOR NOTERead this before staring at the numbers.
  • The model is most confident at Stable calls — F1 0.70, precision 67%, recall 74%.

  • Nerf calls are conservative — only 35% of real nerfs are caught in advance, the other 65% slip through (precision 46%).

  • Confidence does carry signal — predictions made at p_nerf ≥ 0.70 hit 60% of the time (n=15).

Glossary Precision = "of predictions the model called nerf, share that were actual nerfs". Recall = "of actual nerfs, share that the model caught in advance". You can't push both to 100% — it's a balancing act.
Overall metricsPer-class performance
Direction hit rate
61%
Across 453 predictions.
Balanced accuracy
0.541
Class-imbalance corrected.
5-class hit rate
45%
Mild/strong intensity also correct.
Top-3 nerf / act
50%
Actual nerfs among top-3 nerf picks per act.
stablen=249
Precision
0.67
Recall
0.74
F1
0.70
buffn=101
Precision
0.54
Recall
0.54
F1
0.54
nerfn=103
Precision
0.46
Recall
0.35
F1
0.40
Per-agent scoreboardBest hits · biggest misses

Cumulative hit rate per agent across all evaluated acts. Only agents with at least 3 predictions are listed.

Top hits
1
Harbor18/18
100%
2
Phoenix17/18
94%
3
Jett15/18
83%
4
Reyna14/18
78%
5
Breach13/18
72%
Top misses
1
Omen5/18
28%
2
Astra6/18
33%
3
Miks1/3
33%
4
Veto6/14
43%
5
Neon8/18
44%
Confusion matrixPredicted vs actual
Predicted
stablebuffnerf
Actualstable1842837
Actualbuff41546
Actualnerf491836
Diagonal cells = exact matches. Greener = better.
Confidence calibrationDoes higher probability mean higher hit rate?

When the model fires a higher probability, the real-world hit rate should rise too. Each row: share of predictions at that threshold that matched reality.

Nerf predictions
ThresholdnPrecision
0.3013442%
0.409544%
0.506548%
0.604351%
0.701560%
Buff predictions
ThresholdnPrecision
0.1526034%
0.2022637%
0.2519339%
0.3514446%
0.508656%
Lead predictionsCaught one act ahead of the patch

The model raised a nerf signal before any nerf had landed, and the next act confirmed it.

Fadep_nerf 75.0%
V25A6stableV26A1mild nerf
At V25A6 the agent was still untouched, but the model already saw the nerf coming — confirmed one act later at V26A1.
Fadep_nerf 72.6%
V25A4stableV25A5mild nerf
At V25A4 the agent was still untouched, but the model already saw the nerf coming — confirmed one act later at V25A5.
Sovap_nerf 68.3%
E7A2stableE7A3strong nerf
At E7A2 the agent was still untouched, but the model already saw the nerf coming — confirmed one act later at E7A3.
Omenp_nerf 68.0%
V25A6mild buffV26A1mild nerf
At V25A6 the agent was still untouched, but the model already saw the nerf coming — confirmed one act later at V26A1.
Sovap_nerf 67.5%
V25A4stableV25A5strong nerf
At V25A4 the agent was still untouched, but the model already saw the nerf coming — confirmed one act later at V25A5.
Notable hitsHigh-confidence predictions that landed

Cases where the model fired a strong probability and reality went the same direction.

ViperV25A4
predicted strong nerf · actual mild nerf
84%
p_nerf
OmenV26A1
predicted strong nerf · actual mild nerf
80%
p_nerf
ViperE8A1
predicted strong nerf · actual mild nerf
80%
p_nerf
TejoV25A4
predicted strong buff · actual mild buff
79%
p_buff
SovaV26A1
predicted strong nerf · actual mild nerf
78%
p_nerf
YoruE6A3
predicted strong buff · actual mild buff
78%
p_buff
Notable missesHigh-confidence predictions that didn't

Cases where the model fired a strong probability and reality went the opposite way.

FadeE8A1
predicted strong buff · actual stable
3%
p_nerf
AstraE7A3
predicted strong buff · actual strong nerf
2%
p_nerf
AstraV26A1
predicted strong nerf · actual stable
83%
p_nerf
ViperV26A1
predicted strong nerf · actual stable
82%
p_nerf
AstraV25A4
predicted strong buff · actual stable
4%
p_nerf
TejoV25A3
predicted strong buff · actual mild nerf
6%
p_nerf
Per-act trendHit rate over time

As more acts accumulate, training data grows. The chart below checks whether hit rate stabilizes over time — a sanity check against early overfitting.

Direction hit rate5-class hit rateAvg 60%
0%25%50%75%100%E6A2: 52% (n=21)E6A3: 48% (n=21)E7A1: 64% (n=22)E7A2: 58% (n=24)E7A3: 58% (n=24)E8A1: 67% (n=24)E8A2: 60% (n=25)E8A3: 56% (n=25)E9A1: 56% (n=25)E9A2: 64% (n=25)E9A3: 73% (n=26)V25A1: 50% (n=26)V25A2: 74% (n=27)V25A3: 70% (n=27)V25A4: 63% (n=27)V25A5: 39% (n=28)V25A6: 61% (n=28)V26A1: 71% (n=28)E6A2E7A1E7A3E8A2E9A1E9A3V25A2V25A4V25A6V26A1
E6A2
52% · 5c 29%
E6A3
48% · 5c 33%
E7A1
64% · 5c 41%
E7A2
58% · 5c 38%
E7A3
58% · 5c 38%
E8A1
67% · 5c 38%
E8A2
60% · 5c 40%
E8A3
56% · 5c 40%
E9A1
56% · 5c 40%
E9A2
64% · 5c 52%
E9A3
73% · 5c 54%
V25A1
50% · 5c 46%
V25A2
74% · 5c 67%
V25A3
70% · 5c 59%
V25A4
63% · 5c 52%
V25A5
39% · 5c 29%
V25A6
61% · 5c 54%
V26A1
71% · 5c 46%
Dashed line = overall average (60%) · 5c = hit rate including mild/strong intensity.
All predictions453 raw rows
453 / 453rows
ActAgentActualPredictedp_stablep_buffp_nerfHit
E6A2Killjoymild nerfstrong nerf30.92.966.2
E6A2Neonstablestrong nerf24.99.365.8
E6A2Razestablestrong nerf45.94.150.0
E6A2KAYOmild nerfstrong nerf40.915.144.0
E6A2Omenstablestrong nerf30.827.941.3
E6A2Brimstonemild nerfstable56.07.436.5
E6A2Breachstablestable55.211.333.5
E6A2Reynastablestable62.75.431.9
E6A2Fademild nerfstable43.927.828.3
E6A2Gekkomild nerfstable71.20.628.2
E6A2Harborstablestable73.90.525.7
E6A2Astramild buffstrong buff22.156.721.3
E6A2Phoenixstablestable64.514.520.9
E6A2Jettmild buffstrong buff25.155.419.5
E6A2Vipermild nerfstrong buff32.449.118.5
E6A2Sagestablestable64.518.716.8
E6A2Skyemild buffstable61.821.416.8
E6A2Cyphermild buffstable57.926.016.1
E6A2Sovastablestable55.034.010.9
E6A2Yorumild buffstrong buff16.677.26.1
Methodology
Walk-forward Each fold trains on act_idx < T and predicts act_idx == T. Future information never leaks into past evaluation.
Two-stage model Stage A (XGBoost) classifies *touched next patch vs. stable*. Stage B (Logistic Regression) splits touched into nerf vs. buff. Final output: 5 classes (strong/mild nerf · stable · mild/strong buff).
Ground truth Labels come from actual post-patch nerf/buff history, including mid-patch hotfixes and reworks.
Evaluation scope Only acts with confirmed patch outcomes are evaluated (current in-flight act V26A2 excluded).
Generated at 2026-04-25 05:09:17 (UTC)