Skip to work
Lahore, Pakistan · 31.55°N 74.34°E

Taimour Abdul Karim · Data Scientist · AI Engineer

I build AIthat ships.

LLM evaluation, agentic systems, and production GenAI for teams from London to California. I measure models before I trust them: red-teaming, hallucination benchmarks, judge calibration, regression gates. Currently building Bryge.io at Datality while pursuing an MSc in AI at LUMS.

Taimour Abdul Karim, photographed in Lahore
Fig. 1 · T. A. KarimLahore, Pakistan
3+Years shipping AI2,500+GitHub stars earned94%Of Bryge's production codebase+20%Accuracy lift on Llama 3.1

Shipped for Datality / Bryge.io · Qult Technologies · BornGreat · LUMS

Who I am

I'm a data scientist and AI engineer. Three years in, I own 94% of the codebase behind Bryge.io, a multi-tenant industrial analytics platform where LLM agents answer plain-English questions over live sensor data. My specialty is the part most teams skip: evaluation. I red-team models, measure hallucination by topic, calibrate LLM judges against blind human labels, and put regression gates in CI so prompt changes can't silently break production. MSc in AI at LUMS; 2,500+ GitHub stars.

Now
Building Bryge.io at Datality (London, remote) · MSc AI at LUMS
Focus
LLM evaluation & safety, agentic systems, RAG, production GenAI
Base
Lahore, Pakistan · working across UK / US time zones

What I represent

Production over demos

If it can't survive real traffic, it isn't done. Everything I ship is dockerized, monitored, and built to be handed over.

Grounded over plausible

LLMs earn trust by citing sources. My RAG systems constrain every answer to retrieved context. No confident hallucinations.

Measured over claimed

“+20% accuracy” means a benchmark, not a feeling. Improvements get numbers, baselines, and reproducible runs.

Research is easy. Production is the test, and these systems passed it.

What I’ve built

15 case studies · 2023 to 2026

01 · Flagship · 2026 · LUMS graduate research · Productionized

Clinical LLM Bias Audit

The Geographic Disparity Index

A clinical assistant that tells a Boston patient to come in for a visit but tells an identical Lagos patient to manage it at home is an equity failure with real stakes. Standard accuracy benchmarks cannot see geography-driven or name-driven disparity, so it goes unmeasured.

System notes

  • Wilcoxon, BCa bootstrap, and Cohen's h implemented from scratch in stdlib, no scipy, fully unit-tested
  • Deterministic perturbation plus an idempotency cache keyed on a SHA-256 of model, prompt, seed, and temperature makes reruns byte-identical and free
  • Pre-registered Bonferroni-corrected alpha of 0.005 across 3 regions and 3 care axes
  • Ships name-only, geo-only, and combined ablations plus gender-by-geography intersectional panels

AWS Bedrock · OpenAI · Groq · FastAPI · Streamlit · Wilcoxon / BCa bootstrap

Don't take my word for it. The code is public, and 2,400+ developers starred it.

Proof, in public

github.com/tkarim45
clinical-llm-bias-auditReproducible fairness-audit framework for clinical LLMs: the Geographic Disparity Index, with from-scratch statistics.new
sec-rag-analystProduction-style RAG over SEC 10-K filings: hybrid BM25 plus dense retrieval, RRF fusion, cross-encoder rerank, cited answers.new
credit-default-mlopsEnd-to-end MLOps: DVC, MLflow, a CI quality gate that blocks bad models, drift detection, and Prometheus-instrumented serving.new
everytongueTrain a neural translator for any language from a spreadsheet. Low-resource NLLB-200 recipe, pip-installable.new
site2botTurn any website into a fully offline local chatbot. No API keys, no cloud, about 600 lines of Python. Published on PyPI.new

441 followers · 30 public repositories · contributing since Jul 2021

Where I’ve shipped

4 roles · 2022–present

Jan 2024 – Present

London (Remote)

Datality · Bryge.io

AI/ML Engineer

  • Architected and built Bryge.io, a multi-tenant industrial IoT analytics platform: AI infers unknown sensor schemas at runtime and answers plain-English questions over live MQTT streams. 944 of the project's 1,001 commits are mine.
  • Engineered the streaming LLM agent on AWS Bedrock (Claude): 15 schema-agnostic tools, adaptive 5-to-25-step reasoning budgets, SSE streaming, serving 285 API endpoints.
  • Shipped the energy intelligence pipeline: real-time cost, carbon, and loss computation on a TimescaleDB gold layer, with tariff support and 13 background workers.
  • Migrated the platform off AWS ECS/EC2 onto Railway + Vercel with schema-per-tenant isolation, read-only DB roles, and a SQL sandbox for defense in depth.
  • FastAPI, React, TimescaleDB, AWS Bedrock, pgvector, MQTT, Railway, Vercel.

Aug 2023 – Dec 2024

Lahore

Qult Technologies

AI/ML Engineer

  • Fine-tuned Llama 3.1 8B for text classification: +20% accuracy over the base model on production NLU tasks.
  • Took models from notebook to production with zero manual steps: end-to-end pipelines on AWS SageMaker and Lambda.
  • APIs, NLP, LLMs, Docker, AWS, TensorFlow, GitHub.

Sep 2022 – Feb 2023

California (Remote)

BornGreat

Data Scientist

  • Built NLP + LLM sentiment analysis over social media streams, turning raw posts into supply, demand, and competitor signals the team acted on.
  • Shipped a FastAPI service for real-time model inference, wired to GitHub Actions so every merge deploys itself.
  • Selenium, APIs, NLP, LLMs, FastAPI, Docker, TensorFlow.

Jun 2021 – Aug 2021

Lahore

Spyresync

Python Developer (Intern)

  • Built interactive Django dashboards over datasets I cleaned and feature-engineered myself.
  • Integrated scikit-learn models into web apps to ship predictive features to real users.
  • Python, APIs, GitHub, Django, Scikit-learn.

Stack & credentials

Working stack

Data Science / ML
Python · SQL · Pandas · Scikit-Learn · PyTorch · TensorFlow · SciPy · Causal inference · Statistical analysis
GenAI / LLM Engineering
AWS Bedrock (Claude) · OpenAI APIs · LangChain · LangGraph · RAG architectures · Agent tool design · Fine-tuning (Llama, NLLB) · pgvector / FAISS / ChromaDB
LLM Evaluation / Safety
DeepEval · RAGAS · LLM-as-judge calibration · Red-teaming (OWASP LLM Top 10) · Hallucination benchmarks · Fairness auditing · CI regression gates
Infra / MLOps
FastAPI · Docker · AWS · DVC · MLflow · Prometheus · Grafana · TimescaleDB · GitHub Actions · Railway · Vercel

Education

  • MSc. in Artificial Intelligence

    Lahore University of Management Sciences (LUMS)

    Sep 2024 – Present

  • BSc. in Data Science

    National University of Computing and Emerging Sciences

    Sep 2020 – Jun 2024

Recognition

  • 2nd Position, Genesis Hackathon Dubai
  • Exceptional Deputy Head, SOFTEC
  • Data Manipulation, DataCamp
  • Joining Data with Pandas, DataCamp

Open to AI/ML engineering roles, remote or hybrid

Let’s build something that ships.

taimour.a.karim@gmail.com

or +92 326 1127700 · usually replies within a day

© 2026 Taimour Abdul KarimDesigned & built in Lahore