2026Causal inference · Decision API

Uplift Targeting EnginePersuadables, Not Conversions

A causal uplift-modeling engine that predicts the incremental effect of an intervention per user, the persuadables, not who converts anyway. Meta-learners on experiment data, evaluated with Qini and policy value, shipped as a treat/don't-treat decision API.

System architecture

Build spec

Learners: S / T / X / R, from scratch on XGBoost
Data: Hillstrom 64k · Criteo 25.3M rows
Eval: Qini · uplift@decile · policy value
Cross-check: econml NonParamDML, 0.96 Spearman
Result: Policy 0.127 vs random 0.121 at 30% budget

Problem

Most ML predicts outcomes (will this user convert), which over-targets the sure things and the lost causes and wastes budget. The question that actually pays is causal: if I spend one unit of budget, on whom does it change the outcome? A classifier cannot answer that.

Approach

S, T, X, and R meta-learners are implemented from scratch over XGBoost base learners, each estimating per-user uplift (CATE). A data layer loads real experiment data (Hillstrom, Criteo) or simulates an RCT with known ground-truth effect, and evaluation grades models with Qini curves, uplift-at-decile, and policy value versus random and treat-all. A cross-check validates the hand-rolled R-learner against econml's NonParamDML on the same split. A budget-aware FastAPI /score endpoint maps a budget fraction to a threshold drawn from a held-out uplift distribution, with a Streamlit UI for single-user scoring and budget sweeps.

Impact

On the 64k-row Hillstrom email experiment the uplift policy beats random targeting at a fixed 30% budget (0.127 vs 0.121), and the from-scratch R-learner matches econml at 0.96 rank correlation. On a simulated RCT with known truth all learners recover the ranking (Spearman up to 0.951) at near-zero ATE error, proving the evaluator is correct before it is trusted on real data.

Decisions & tradeoffs

Implement the learners from scratch

The meta-learner mechanics are hand-rolled over XGBoost rather than imported wholesale, which is what interviewers probe. econml and causalml stay as optional cross-check dependencies, not the core.

Cross-check against econml

A per-user causal target can never be observed, so it cannot be graded directly. Training the scratch R-learner and econml's NonParamDML on the same split and agreeing at 0.96 rank correlation is the evidence the estimator is correct.

Budget-derived decision threshold

Rather than a fixed cutoff, the API derives a threshold from a stashed held-out uplift distribution. The treat/don't-treat decision directly answers how many users the budget can afford.

System notes

S, T, X, and R meta-learners hand-rolled over XGBoost bases, the mechanics interviewers probe
The from-scratch R-learner is cross-checked against Microsoft econml at 0.96 Spearman, proving correctness not just plausibility
The API derives its treat threshold from a held-out uplift distribution stashed in the model bundle, so the cutoff answers how many you can afford to treat
The evaluator is validated on a simulated RCT with known effect before it is trusted on hidden-truth real data

Stack

Meta-learners · XGBoost · econml · Qini curves · FastAPI · Streamlit

View source on GitHub

Next project

LLM-as-Judge System · Calibrated Against Blind Human Labels