A full content platform that parses blog content, generates AI variations across text, voice, and images, then runs blind A/B voting battles to measure which variations actually perform, all ranked on a live, trust-weighted leaderboard. Built solo, end to end.
A content lab wanted to systematically improve blog content with AI and prove which variations were actually better, but had no way to generate text, voice, and image variants at scale, and no objective method to compare them beyond gut feel.
One platform that parses content, generates AI variations across three media types, runs blind A/B battles so humans vote without bias, and aggregates everything into a trust-weighted leaderboard, turning "which version is better?" into measured data.
Markdown blogs are parsed into structured blocks using pure Python regular expressions, deterministic, instant, and zero API cost, unlike LLM-based parsing.
Messy extracted terms are mapped to a 1,000+ term official taxonomy using three string-matching algorithms plus LLM semantic matching for the hard cases.
A position-based power score surfaces which themes are over- or under-represented, exposing content coverage gaps.
The generation hub runs a two-phase batch: text first (improvements, scripts, captions, scores), then media. Work runs concurrently, tuned per service to respect rate limits.
Four battle types pit variations against each other with randomized placement so voters can't tell which model produced what. Shareable voter links extend it to outsiders.
Votes aggregate into a leaderboard across all media types, with a 50-vote trust threshold so rankings only "count" once they're statistically meaningful.
One continuous pipeline. Raw content goes in; measured, ranked, AI-improved variations come out, with every step backed by triple-layer storage.
Dual-write to IndexedDB and Firestore plus Firebase Storage for media keeps the app fast and offline-capable while staying cloud-synced.
Text, voice, and image jobs run in parallel, each with its own rate limit, delay, and retry/backoff so large batches finish without tripping API limits.
AG-Grid renders tens of thousands of parsed blocks smoothly with filtering, pagination, and a dark theme, no lag on large imports.
Choosing pure regex over an LLM for parsing made the core fast, predictable, and free, LLM budget is spent only where it adds real value.
Randomized placement, hidden model names, and a vote-count trust threshold keep A/B results honest and statistically grounded.
Multiple language and image models are orchestrated behind a single interface, with per-task model selection and clean fallbacks.
I turn messy business problems into reliable AI systems, content platforms, agents, RAG, and automation, designed and shipped solo.