Prediction Accuracy Report
For every act, the model is retrained from scratch using only data available at that point, then asked to predict that act. No future information leaks into past evaluation — only predictions the model could realistically have made in real time are scored.
The model is most confident at Stable calls — F1 0.70, precision 67%, recall 74%.
Nerf calls are conservative — only 35% of real nerfs are caught in advance, the other 65% slip through (precision 46%).
Confidence does carry signal — predictions made at p_nerf ≥ 0.70 hit 60% of the time (n=15).
Cumulative hit rate per agent across all evaluated acts. Only agents with at least 3 predictions are listed.
| Predicted | |||
|---|---|---|---|
| stable | buff | nerf | |
| Actualstable | 184 | 28 | 37 |
| Actualbuff | 41 | 54 | 6 |
| Actualnerf | 49 | 18 | 36 |
When the model fires a higher probability, the real-world hit rate should rise too. Each row: share of predictions at that threshold that matched reality.
| Threshold | n | Precision | |
|---|---|---|---|
| ≥ 0.30 | 134 | 42% | |
| ≥ 0.40 | 95 | 44% | |
| ≥ 0.50 | 65 | 48% | |
| ≥ 0.60 | 43 | 51% | |
| ≥ 0.70 | 15 | 60% |
| Threshold | n | Precision | |
|---|---|---|---|
| ≥ 0.15 | 260 | 34% | |
| ≥ 0.20 | 226 | 37% | |
| ≥ 0.25 | 193 | 39% | |
| ≥ 0.35 | 144 | 46% | |
| ≥ 0.50 | 86 | 56% |
The model raised a nerf signal before any nerf had landed, and the next act confirmed it.
Cases where the model fired a strong probability and reality went the same direction.
Cases where the model fired a strong probability and reality went the opposite way.
As more acts accumulate, training data grows. The chart below checks whether hit rate stabilizes over time — a sanity check against early overfitting.
| Act | Agent | Actual | Predicted | p_stable | p_buff | p_nerf | Hit |
|---|---|---|---|---|---|---|---|
| E6A2 | Killjoy | mild nerf | strong nerf | 30.9 | 2.9 | 66.2 | ✓ |
| E6A2 | Neon | stable | strong nerf | 24.9 | 9.3 | 65.8 | ✗ |
| E6A2 | Raze | stable | strong nerf | 45.9 | 4.1 | 50.0 | ✗ |
| E6A2 | KAYO | mild nerf | strong nerf | 40.9 | 15.1 | 44.0 | ✓ |
| E6A2 | Omen | stable | strong nerf | 30.8 | 27.9 | 41.3 | ✗ |
| E6A2 | Brimstone | mild nerf | stable | 56.0 | 7.4 | 36.5 | ✗ |
| E6A2 | Breach | stable | stable | 55.2 | 11.3 | 33.5 | ✓ |
| E6A2 | Reyna | stable | stable | 62.7 | 5.4 | 31.9 | ✓ |
| E6A2 | Fade | mild nerf | stable | 43.9 | 27.8 | 28.3 | ✗ |
| E6A2 | Gekko | mild nerf | stable | 71.2 | 0.6 | 28.2 | ✗ |
| E6A2 | Harbor | stable | stable | 73.9 | 0.5 | 25.7 | ✓ |
| E6A2 | Astra | mild buff | strong buff | 22.1 | 56.7 | 21.3 | ✓ |
| E6A2 | Phoenix | stable | stable | 64.5 | 14.5 | 20.9 | ✓ |
| E6A2 | Jett | mild buff | strong buff | 25.1 | 55.4 | 19.5 | ✓ |
| E6A2 | Viper | mild nerf | strong buff | 32.4 | 49.1 | 18.5 | ✗ |
| E6A2 | Sage | stable | stable | 64.5 | 18.7 | 16.8 | ✓ |
| E6A2 | Skye | mild buff | stable | 61.8 | 21.4 | 16.8 | ✗ |
| E6A2 | Cypher | mild buff | stable | 57.9 | 26.0 | 16.1 | ✗ |
| E6A2 | Sova | stable | stable | 55.0 | 34.0 | 10.9 | ✓ |
| E6A2 | Yoru | mild buff | strong buff | 16.6 | 77.2 | 6.1 | ✓ |
act_idx < T and predicts act_idx == T. Future information never leaks into past evaluation.