Skip to work
All work
2026End-to-end MLOps · CI quality gate

Credit Default MLOpsNotebook to Operated System

An end-to-end MLOps pipeline around a credit-default classifier: versioned data, a tracked and registered model, an automated CI quality gate that blocks bad models, drift detection, and an observable serving endpoint.

System architecture

DVC: trainXGBoost + MLflowGateexit≠0 fails CIDriftPSI + EvidentlyFastAPI/predict /metricsPrometheusscrapeGrafanadashboard

Build spec

Gate
ROC-AUC 0.70 · PR-AUC 0.45 · Brier 0.20
Model
XGBoost, registered in MLflow
Versioning
DVC pipeline, dvc repro reproducible
Drift
Per-feature PSI (numpy) + Evidently
Serving
FastAPI, Prometheus + Grafana

Problem

The gap most ML portfolios never cross is notebook to operated system: models die in notebooks with no tracking, no gating, no monitoring, and no observability. A model needs an operations layer to actually live in production.

Approach

A DVC pipeline versions data and reruns only what changed, from prepare to train to evaluate to drift. Training fits an XGBoost pipeline and logs params, metrics, and the model to MLflow, registering it as credit-default-classifier. The evaluate stage enforces a quality gate (ROC-AUC 0.70, PR-AUC 0.45, Brier 0.20) that exits non-zero and fails CI on a bad model. A drift stage computes per-feature PSI deterministically in numpy plus an Evidently report. A FastAPI service exposes /predict, /metrics, /drift, and /health, instrumented for Prometheus and a Grafana dashboard, with the full stack runnable via docker compose.

Impact

It demonstrates the complete operations layer: experiment tracking, a model registry, a CI gate that blocks regressions before they ship, drift detection with alerting, and a metricized serving endpoint. The whole thing is reproducible via dvc repro and self-contained on synthetic data, so it runs offline with no dataset download or keys.

Decisions & tradeoffs

The quality gate exits non-zero

Putting the metric thresholds in the evaluate stage as a hard non-zero exit makes the DVC stage and CI fail on a bad model. A regression can never be registered or shipped.

PSI in numpy as source of truth, Evidently best-effort

Hand-rolled deterministic PSI feeds the reports and Prometheus gauges while Evidently provides only the rich visual. The pipeline is insulated from Evidently's fast-moving API ever breaking a run.

Synthetic, self-contained data

A generated dataset with an injected macro shock removes any download or key, keeping the repo reproducible offline. It also gives the drift monitor genuine drift to detect.

System notes

  • An automated CI quality gate exits non-zero and blocks any model below the ROC-AUC, PR-AUC, and Brier floors
  • Per-feature PSI computed deterministically in numpy feeds both drift.json and Prometheus gauges; the Evidently report is best-effort so it can never break the pipeline
  • A synthetic generator injects a deliberate macro shock so the drift monitor has real drift to catch
  • docker compose brings up the API, Prometheus, and Grafana as one observable stack

Stack

DVC · MLflow · XGBoost · Evidently / PSI · Prometheus · Grafana

View source on GitHub
Next project
Uplift Targeting Engine · Persuadables, Not Conversions