Decision Model Review — GW33

Retrospective audit of Layers 1–3 · Top-50 managers · 3-GW replacement window

Findings & Recommendations

Synthesis from Layers 1–3 below. Supported findings only — inconclusive signals deliberately omitted.

Layer 1 — Chip deployment

Herd TC timing was anti-information. GW26 drew 29 of 84 TCs but had 8.41 pts mean regret vs 4.04 pts off-modal (only 41.7% of TCs landed on the optimal GW).
Herd BB timing also anti-information. GW33 drew 34 BBs with 17.56 pts mean regret vs 5.59 pts off-modal — ~3× worse. Modal-GW BB on the obvious DGW underdelivered relative to quieter weeks.
Free Hit usage is broadly value-additive. 80.0% of FH plays delivered positive delta vs the pre-FH squad (mean +15.0 pts). No evidence of systematic mistiming.

Recommendation: Do not auto-follow top-50 modal GW for TC or BB — the herd signal was anti-information this season. Continue eager Free Hit deployment; don't hold for a "perfect" DGW.

Layer 2 — Triage calibration

compute_verdict discriminates strongly. HOLD (n=483) averaged 4.22 pts/GW vs SELL (n=21,014) at 0.81 pts — a 3.41-pt gap. The model works.
Suggested threshold shift is overfit — keep current values. Grid-sweep winner (HOLD≥0.7) lifts the gap by ~0.4 pts but shrinks the HOLD bucket from 483 to ~84 rows (0.3% of population). Marginal lift is inside sampling noise. Do not change verdict_weights.json.

Recommendation: Keep verdict_weights.json at HOLD≥0.6 / MONITOR≥0.45. Re-evaluate after a full season of coverage; current grid-sweep lift is overfit to a tiny tail.

Layer 3 — Replacement quality

Premium IN-players dominate. ≥£10m tier wins 79.1% (mean +13.1 pts/3GW) vs <£6m at 54.6% (+3.5 pts) — 24.5pp gap. Bias trade targets toward premium when funds permit.
MID > FWD as trade targets. MID (n=1,379) wins 66.3% (+6.9 pts) vs FWD at 56.2% (+5.5 pts) — 10pp gap on substantive samples.
62.2% overall win rate on n=3,050 trades. Trades are net positive but ~38% are wash-or-loss; verdict-driven transfers are not free yield.

Recommendation: When triage flags multiple SELLs, prioritise replacements that are premium MIDs. Treat budget enabler trades as marginal — only when needed for chip funding or short-term punt, not as default value plays.

Caveats on these findings

Hindsight: regret figures assume perfect information. The "anti-modal" finding does not mean the herd was irrational — it means the obvious play didn't pay this season.
Audit subjects are top-50 managers, not our 3 teams. Layer 1/3 reflects league-leader behaviour; per-team rows for SlowBuild / Yamal1 / MbappeSalah are pending data capture (TODO.md).
n=1 season. Findings are calibration evidence, not laws. Re-run end-of-season and after 2026/27 H1 before any structural model change.

Layer 1 — Chip Deployment Audit

Top-50 managers · GW1–33 · Hindsight regret: best alternative GW within same chip half

Chip	Events	Modal GW	Median Regret	P90 Regret	Hit Optimal GW
Triple Captain	84	GW26	6.0 pts	14 pts	41.7%
Bench Boost	92	GW33	9.0 pts	21 pts	25.0%
Free Hit	60	GW13	delta vs pre-FH squad: +15.0 pts avg \| 80.0% positive
Wildcard	89	GW32	descriptive only — yield analysis out of scope

Approximations: regret uses squad-actual-per-GW (manager's real captain/bench each week). Hindsight regret assumes perfect information — see report footer for caveats.

Chip Timing — Yield & Regret by Chip Half

Each team receives a full chip set at GW1 and a second set at GW20, so TC/BB/FH decisions live in two independent windows. These Paretos split by chip half: H1 (GW1–19) covers TC1/BB1/FH1; H2 (GW20+) covers TC2/BB2/FH2. n≥5 per GW; tail grouped into Other.

First half — GW1–19 (TC1, BB1, FH1)

TC ceiling — H1

Managers whose TC1 ceiling landed here (n=50)

BB ceiling — H1

Managers whose BB1 ceiling landed here (n=50)

FH net Δ — H1

Sum of fh_delta by deployed GW (n=48); green = paid off, red = trap

Second half — GW20–33 (TC2, BB2, FH2)

TC ceiling — H2

Managers whose TC2 ceiling landed here (n=34)

BB ceiling — H2

Managers whose BB2 ceiling landed here (n=42)

FH net Δ — H2

Sum of fh_delta by deployed GW (n=12) small sample; green = paid off, red = trap

Regret concentration by half

TC + BB regret — H1

Stacked regret pts at each deployed GW · 100 events · 438 pts total

TC + BB regret — H2

Stacked regret pts at each deployed GW · 76 events · 949 pts total

TC regret BB regret Other (n<5 per GW) Cumulative % 80% guideline

How to read these charts

Yield Paretos (rows 1 & 2)

Top bars = highest-yield GWs in that half. Each bar shows how many of the top-50 managers had their chip ceiling at that GW. The cumulative line crossing 80% marks the "vital few" deployment windows.
Free Hit bars are signed. Green = FH net-paid-off across the sample; red = systematic trap. GW33 (H2) is the clearest negative-EV FH week.
Concentration shape tells you timing importance. A sharp Pareto (H1 TC) says a single GW dominated; a flat one (H2 BB) says timing mattered less than squad quality in that window.

Regret Paretos (row 3)

Tall bars = where the herd deployed and paid for it. Top bars collect the most forgone yield within the half.
Stack colour identifies the misfired chip. Amber-dominated = TC mistiming; purple-dominated = BB mistiming.
A small "Other" bar means regret is concentrated, not diffuse — reinforces the "target specific GWs" lesson for that half.

H1 vs H2 — cross-half reading

H1 TC is dominated by a single GW (GW6 accounts for the majority of H1 TC ceilings); H2 TC splits across GW31 and GW28. Interpretation: "when to play TC?" has a clearer answer in H1.
H2 FH is a sparse sample — only GW33 meets the n≥5 threshold. Read it as "the actionable finding is DGW33," not as a general H2 FH distribution.
H2 regret concentrates in 2 GWs (≈93% cumulative); H1 regret is diffuse across ~8 GWs. Different mistiming patterns apply to each half.

Combined interpretation — cross-reference yield and regret within each half:

Yield top bar?	Regret top bar?	Interpretation
Yes	Yes	Ceiling was real, herd missed it → strongest signal to target next year
No	Yes	Mediocre GW, slight ceiling miss → low-stakes mistake
Yes	No	Herd matched the ceiling (rare) → herd behaviour worked
No	No	Uneventful GW — no signal

Layer 2 — Triage Model Calibration

Reconstructed verdicts via compute_verdict · avg_fdr fixed at 3.0 (neutral) · chance_next/tsb_pct from current snapshot

Current Hold Threshold

0.6

Current Monitor Threshold

0.45

Threshold Signal

Suggested: HOLD≥0.7 / MONITOR≥0.35 (current: 0.6/0.45)

Per-bucket next-GW points (current thresholds)

Verdict	Count	Mean pts next GW	Median pts next GW
HOLD	483	4.22	2
MONITOR	3,079	3.01	2
SELL	21,014	0.81	0.0

Grid sweep — top 10 threshold pairs by HOLD−SELL discrimination gap

Hold thresh	Monitor thresh	HOLD mean	SELL mean	Gap ↑
0.70	0.35	4.4	0.55	3.85
0.70	0.40	4.4	0.68	3.72
0.60	0.35	4.22	0.55	3.67
0.70	0.45	4.4	0.81	3.59
0.60	0.40	4.22	0.68	3.54
0.65	0.35	4.08	0.55	3.53
0.70	0.50	4.4	0.94	3.46
0.60	0.45	4.22	0.81	3.41
0.65	0.40	4.08	0.68	3.4
0.70	0.55	4.4	1.03	3.37

Layer 3 — Transfer Quality Audit

Top-50 managers · 3-GW point window · Skips FH/WC weeks

Total Trades

3,050

Win Rate

62.2%

Mean Δ pts (3 GW)

5.9

Median Δ pts (3 GW)

6.0

By IN-player position

Position	Trades	Win Rate	Mean Δ	Median Δ
GKP	125	WIN 62.4%	4.6 pts	5 pts
DEF	912	WIN 60.1%	5.0 pts	5.0 pts
MID	1,379	WIN 66.3%	6.9 pts	7 pts
FWD	634	WIN 56.2%	5.5 pts	4.0 pts

By IN-player price tier

Price Tier	Trades	Win Rate	Mean Δ	Median Δ
premium (≥10m)	139	WIN 79.1%	13.1 pts	13 pts
mid (8-10m)	509	WIN 68.4%	8.4 pts	8 pts
budget-mid (6-8m)	1,247	WIN 64.7%	6.4 pts	7 pts
budget (<6m)	1,155	WIN 54.6%	3.5 pts	4 pts