Case Study · AI Content Platform

AI Content Generation &
Evaluation Platform

A full content platform that parses blog content, generates AI variations across text, voice, and images, then runs blind A/B voting battles to measure which variations actually perform, all ranked on a live, trust-weighted leaderboard. Built solo, end to end.

Role: Sole Engineer & Architect Domain: AI Content & Media Status: Live in production

[ 01 ] The business problem

A content lab wanted to systematically improve blog content with AI and prove which variations were actually better, but had no way to generate text, voice, and image variants at scale, and no objective method to compare them beyond gut feel.

[ 02 ] The technical solution

One platform that parses content, generates AI variations across three media types, runs blind A/B battles so humans vote without bias, and aggregates everything into a trust-weighted leaderboard, turning "which version is better?" into measured data.

// architecture

The content pipeline

Parse, pure regex, no LLM

Markdown blogs are parsed into structured blocks using pure Python regular expressions, deterministic, instant, and zero API cost, unlike LLM-based parsing.

Python RegexFlaskStructured Parsing

Match traits to a taxonomy

Messy extracted terms are mapped to a 1,000+ term official taxonomy using three string-matching algorithms plus LLM semantic matching for the hard cases.

LevenshteinRapidFuzzJaro-WinklerLLM Matching

Gap analysis

A position-based power score surfaces which themes are over- or under-represented, exposing content coverage gaps.

Power ScoringCoverage AnalysisCSV Export

Multi-modal generation

The generation hub runs a two-phase batch: text first (improvements, scripts, captions, scores), then media. Work runs concurrently, tuned per service to respect rate limits.

Concurrent LLM CallsText-to-SpeechImage GenerationRetry Logic

Battle, blind A/B voting

Four battle types pit variations against each other with randomized placement so voters can't tell which model produced what. Shareable voter links extend it to outsiders.

Blind Voting4 Battle TypesExternal Voter Pages

Leaderboard, ranked results

Votes aggregate into a leaderboard across all media types, with a 50-vote trust threshold so rankings only "count" once they're statistically meaningful.

AggregationTrust SystemIn-Memory Cache

parse → match → gap → generate → battle → rank

One continuous pipeline. Raw content goes in; measured, ranked, AI-improved variations come out, with every step backed by triple-layer storage.

// engineering depth

Hard problems solved

~ storage

Triple-layer architecture

Dual-write to IndexedDB and Firestore plus Firebase Storage for media keeps the app fast and offline-capable while staying cloud-synced.

~ concurrency

Tuned parallel pipelines

Text, voice, and image jobs run in parallel, each with its own rate limit, delay, and retry/backoff so large batches finish without tripping API limits.

~ performance

50K-row virtualization

AG-Grid renders tens of thousands of parsed blocks smoothly with filtering, pagination, and a dark theme, no lag on large imports.

~ cost

Deterministic parsing

Choosing pure regex over an LLM for parsing made the core fast, predictable, and free, LLM budget is spent only where it adds real value.

~ integrity

Bias-free evaluation

Randomized placement, hidden model names, and a vote-count trust threshold keep A/B results honest and statistically grounded.

~ orchestration

Multi-model routing

Multiple language and image models are orchestrated behind a single interface, with per-task model selection and clean fallbacks.

// stack

Built with

Backend

Python 3.11FlaskJinja2httpx (async)tenacityGunicorn

AI Services

LLM IntegrationText-to-SpeechImage GenerationMulti-Model RoutingSSE Streaming

Frontend

Vanilla JS ES6+AG-Grid v31Tailwind CSSMarked.jsFirebase JS SDK

Storage

IndexedDBFirestoreFirebase StorageDual-WriteCaching

Algorithms

LevenshteinRapidFuzzJaro-WinklerLLM Semantic MatchPower Scoring

Reliability / Ops

Retry / BackoffRate LimitingAsync ConcurrencyRender DeploySystem Architecture

AI Content Generation &
Evaluation Platform

[ 01 ] The business problem

[ 02 ] The technical solution

The content pipeline

Parse, pure regex, no LLM

Match traits to a taxonomy

Gap analysis

Multi-modal generation

Battle, blind A/B voting

Leaderboard, ranked results

Hard problems solved

Triple-layer architecture

Tuned parallel pipelines

50K-row virtualization

Deterministic parsing

Bias-free evaluation

Multi-model routing

Built with

The result

Have a problem that needs a system?

AI Content Generation &Evaluation Platform

[ 01 ] The business problem

[ 02 ] The technical solution

The content pipeline

Parse, pure regex, no LLM

Match traits to a taxonomy

Gap analysis

Multi-modal generation

Battle, blind A/B voting

Leaderboard, ranked results

Hard problems solved

Triple-layer architecture

Tuned parallel pipelines

50K-row virtualization

Deterministic parsing

Bias-free evaluation

Multi-model routing

Built with

The result

Have a problem that needs a system?

AI Content Generation &
Evaluation Platform