TANVEER HUSSAIN / AI ENGINEER
← All work Experience Certs Get in touch →
Case Study · AI Content Platform

AI Content Generation &
Evaluation Platform

A full content platform that parses blog content, generates AI variations across text, voice, and images, then runs blind A/B voting battles to measure which variations actually perform, all ranked on a live, trust-weighted leaderboard. Built solo, end to end.

Role: Sole Engineer & Architect Domain: AI Content & Media Status: Live in production

[ 01 ] The business problem

A content lab wanted to systematically improve blog content with AI and prove which variations were actually better, but had no way to generate text, voice, and image variants at scale, and no objective method to compare them beyond gut feel.

[ 02 ] The technical solution

One platform that parses content, generates AI variations across three media types, runs blind A/B battles so humans vote without bias, and aggregates everything into a trust-weighted leaderboard, turning "which version is better?" into measured data.

// architecture

The content pipeline

1

Parse, pure regex, no LLM

Markdown blogs are parsed into structured blocks using pure Python regular expressions, deterministic, instant, and zero API cost, unlike LLM-based parsing.

Python RegexFlaskStructured Parsing
2

Match traits to a taxonomy

Messy extracted terms are mapped to a 1,000+ term official taxonomy using three string-matching algorithms plus LLM semantic matching for the hard cases.

LevenshteinRapidFuzzJaro-WinklerLLM Matching
3

Gap analysis

A position-based power score surfaces which themes are over- or under-represented, exposing content coverage gaps.

Power ScoringCoverage AnalysisCSV Export
4

Multi-modal generation

The generation hub runs a two-phase batch: text first (improvements, scripts, captions, scores), then media. Work runs concurrently, tuned per service to respect rate limits.

Concurrent LLM CallsText-to-SpeechImage GenerationRetry Logic
5

Battle, blind A/B voting

Four battle types pit variations against each other with randomized placement so voters can't tell which model produced what. Shareable voter links extend it to outsiders.

Blind Voting4 Battle TypesExternal Voter Pages
6

Leaderboard, ranked results

Votes aggregate into a leaderboard across all media types, with a 50-vote trust threshold so rankings only "count" once they're statistically meaningful.

AggregationTrust SystemIn-Memory Cache
parse → match → gap → generate → battle → rank

One continuous pipeline. Raw content goes in; measured, ranked, AI-improved variations come out, with every step backed by triple-layer storage.

// engineering depth

Hard problems solved

~ storage

Triple-layer architecture

Dual-write to IndexedDB and Firestore plus Firebase Storage for media keeps the app fast and offline-capable while staying cloud-synced.

~ concurrency

Tuned parallel pipelines

Text, voice, and image jobs run in parallel, each with its own rate limit, delay, and retry/backoff so large batches finish without tripping API limits.

~ performance

50K-row virtualization

AG-Grid renders tens of thousands of parsed blocks smoothly with filtering, pagination, and a dark theme, no lag on large imports.

~ cost

Deterministic parsing

Choosing pure regex over an LLM for parsing made the core fast, predictable, and free, LLM budget is spent only where it adds real value.

~ integrity

Bias-free evaluation

Randomized placement, hidden model names, and a vote-count trust threshold keep A/B results honest and statistically grounded.

~ orchestration

Multi-model routing

Multiple language and image models are orchestrated behind a single interface, with per-task model selection and clean fallbacks.

// stack

Built with

Backend
Python 3.11FlaskJinja2httpx (async)tenacityGunicorn
AI Services
LLM IntegrationText-to-SpeechImage GenerationMulti-Model RoutingSSE Streaming
Frontend
Vanilla JS ES6+AG-Grid v31Tailwind CSSMarked.jsFirebase JS SDK
Storage
IndexedDBFirestoreFirebase StorageDual-WriteCaching
Algorithms
LevenshteinRapidFuzzJaro-WinklerLLM Semantic MatchPower Scoring
Reliability / Ops
Retry / BackoffRate LimitingAsync ConcurrencyRender DeploySystem Architecture
// outcome

The result

Live
Running in production
3
Media types generated & tested
50K+
Rows virtualized in-grid
100%
Designed & built solo, end to end
// let's build

Have a problem that needs a system?

I turn messy business problems into reliable AI systems, content platforms, agents, RAG, and automation, designed and shipped solo.

Tanveer Hussain · AI Engineer · Building systems that never sleep.