Lahore, Pakistan · 31.55°N 74.34°EData Scientist · AI Engineer · est. 2021

Taimour Abdul Karim · Data Scientist · AI Engineer

I build AIthat ships.

LLM evaluation, agentic systems, and production GenAI for teams from London to California. I measure models before I trust them: red-teaming, hallucination benchmarks, judge calibration, regression gates. Currently building Bryge.io at Datality while pursuing an MSc in AI at LUMS.

See the work Résumé

Taimour Abdul Karim, photographed in Lahore — Fig. 1 · T. A. KarimLahore, Pakistan

3+Years shipping AI2,500+GitHub stars earned94%Of Bryge's production codebase+20%Accuracy lift on Llama 3.1

Shipped for Datality / Bryge.io · Qult Technologies · BornGreat · LUMS

Who I am

I'm a data scientist and AI engineer. Three years in, I own 94% of the codebase behind Bryge.io, a multi-tenant industrial analytics platform where LLM agents answer plain-English questions over live sensor data. My specialty is the part most teams skip: evaluation. I red-team models, measure hallucination by topic, calibrate LLM judges against blind human labels, and put regression gates in CI so prompt changes can't silently break production. MSc in AI at LUMS; 2,500+ GitHub stars.

Now: Building Bryge.io at Datality (London, remote) · MSc AI at LUMS
Focus: LLM evaluation & safety, agentic systems, RAG, production GenAI
Base: Lahore, Pakistan · working across UK / US time zones

What I represent

Production over demos

If it can't survive real traffic, it isn't done. Everything I ship is dockerized, monitored, and built to be handed over.

Grounded over plausible

LLMs earn trust by citing sources. My RAG systems constrain every answer to retrieved context. No confident hallucinations.

Measured over claimed

“+20% accuracy” means a benchmark, not a feeling. Improvements get numbers, baselines, and reproducible runs.

Research is easy. Production is the test, and these systems passed it.

What I’ve built

15 case studies · 2023 to 2026

01 · Flagship · 2026 · LUMS graduate research · Productionized

Clinical LLM Bias Audit

The Geographic Disparity Index

A clinical assistant that tells a Boston patient to come in for a visit but tells an identical Lagos patient to manage it at home is an equity failure with real stakes. Standard accuracy benchmarks cannot see geography-driven or name-driven disparity, so it goes unmeasured.

Read the case study Source

System notes

Wilcoxon, BCa bootstrap, and Cohen's h implemented from scratch in stdlib, no scipy, fully unit-tested
Deterministic perturbation plus an idempotency cache keyed on a SHA-256 of model, prompt, seed, and temperature makes reruns byte-identical and free
Pre-registered Bonferroni-corrected alpha of 0.005 across 3 regions and 3 care axes
Ships name-only, geo-only, and combined ablations plus gender-by-geography intersectional panels

AWS Bedrock · OpenAI · Groq · FastAPI · Streamlit · Wilcoxon / BCa bootstrap

022026SEC RAG Analyst · Hybrid Retrieval over 10-K Filings10-K filings are the hard case for RAG: 100-plus pages, dense tables, repetitive boilerplate, and cross-references, where naive embed-everything plus top-k cosine retrieves plausible but wrong passages. Answers without provenance or measurement cannot be trusted in financial analysis.BM25 · FAISS · BGE · Cross-encoder · Claude · FastAPI

032026Credit Default MLOps · Notebook to Operated SystemThe gap most ML portfolios never cross is notebook to operated system: models die in notebooks with no tracking, no gating, no monitoring, and no observability. A model needs an operations layer to actually live in production.DVC · MLflow · XGBoost · Evidently / PSI · Prometheus · Grafana

+ 12 more

Don't take my word for it. The code is public, and 2,400+ developers starred it.

Proof, in public

github.com/tkarim45 ↗

Most-starred repository · Jupyter Notebook

Beginner-Data-Science-Projects

A curated collection of hands-on data science projects. The most starred repo in my portfolio.

stars · 506 forks

clinical-llm-bias-auditReproducible fairness-audit framework for clinical LLMs: the Geographic Disparity Index, with from-scratch statistics.Pythonnew

sec-rag-analystProduction-style RAG over SEC 10-K filings: hybrid BM25 plus dense retrieval, RRF fusion, cross-encoder rerank, cited answers.Pythonnew

credit-default-mlopsEnd-to-end MLOps: DVC, MLflow, a CI quality gate that blocks bad models, drift detection, and Prometheus-instrumented serving.Pythonnew

everytongueTrain a neural translator for any language from a spreadsheet. Low-resource NLLB-200 recipe, pip-installable.Pythonnew

site2botTurn any website into a fully offline local chatbot. No API keys, no cloud, about 600 lines of Python. Published on PyPI.Pythonnew

441 followers · 30 public repositories · contributing since Jul 2021

Where I’ve shipped

4 roles · 2022–present

Jan 2024 – Present

London (Remote)

Datality · Bryge.io

AI/ML Engineer

Architected and built Bryge.io, a multi-tenant industrial IoT analytics platform: AI infers unknown sensor schemas at runtime and answers plain-English questions over live MQTT streams. 944 of the project's 1,001 commits are mine.
Engineered the streaming LLM agent on AWS Bedrock (Claude): 15 schema-agnostic tools, adaptive 5-to-25-step reasoning budgets, SSE streaming, serving 285 API endpoints.
Shipped the energy intelligence pipeline: real-time cost, carbon, and loss computation on a TimescaleDB gold layer, with tariff support and 13 background workers.
Migrated the platform off AWS ECS/EC2 onto Railway + Vercel with schema-per-tenant isolation, read-only DB roles, and a SQL sandbox for defense in depth.
FastAPI, React, TimescaleDB, AWS Bedrock, pgvector, MQTT, Railway, Vercel.

Aug 2023 – Dec 2024

Lahore

Qult Technologies

AI/ML Engineer

Fine-tuned Llama 3.1 8B for text classification: +20% accuracy over the base model on production NLU tasks.
Took models from notebook to production with zero manual steps: end-to-end pipelines on AWS SageMaker and Lambda.
APIs, NLP, LLMs, Docker, AWS, TensorFlow, GitHub.

Sep 2022 – Feb 2023

California (Remote)

BornGreat

Data Scientist

Built NLP + LLM sentiment analysis over social media streams, turning raw posts into supply, demand, and competitor signals the team acted on.
Shipped a FastAPI service for real-time model inference, wired to GitHub Actions so every merge deploys itself.
Selenium, APIs, NLP, LLMs, FastAPI, Docker, TensorFlow.

Jun 2021 – Aug 2021

Lahore

Spyresync

Python Developer (Intern)

Built interactive Django dashboards over datasets I cleaned and feature-engineered myself.
Integrated scikit-learn models into web apps to ship predictive features to real users.
Python, APIs, GitHub, Django, Scikit-learn.

Stack & credentials

Working stack

Data Science / ML: Python · SQL · Pandas · Scikit-Learn · PyTorch · TensorFlow · SciPy · Causal inference · Statistical analysis
GenAI / LLM Engineering: AWS Bedrock (Claude) · OpenAI APIs · LangChain · LangGraph · RAG architectures · Agent tool design · Fine-tuning (Llama, NLLB) · pgvector / FAISS / ChromaDB
LLM Evaluation / Safety: DeepEval · RAGAS · LLM-as-judge calibration · Red-teaming (OWASP LLM Top 10) · Hallucination benchmarks · Fairness auditing · CI regression gates
Infra / MLOps: FastAPI · Docker · AWS · DVC · MLflow · Prometheus · Grafana · TimescaleDB · GitHub Actions · Railway · Vercel

Education

MSc. in Artificial Intelligence
Lahore University of Management Sciences (LUMS)
Sep 2024 – Present
BSc. in Data Science
National University of Computing and Emerging Sciences
Sep 2020 – Jun 2024

Recognition

2nd Position, Genesis Hackathon Dubai
Exceptional Deputy Head, SOFTEC
Data Manipulation, DataCamp
Joining Data with Pandas, DataCamp

Open to AI/ML engineering roles, remote or hybrid

Let’s build something that ships.

taimour.a.karim@gmail.com

or +92 326 1127700 · usually replies within a day