Back to case study

System Architecture

Visual breakdown of the automated EdTech content generation factory. These pipelines ingest unstructured PDFs and output fully structured, bilingual, and validated micro-learning courses.

0-to-1 Content Factory

Automated Course Generation & Translation

A multi-agent, adversarial LLM architecture designed to solve the "where-to-start" paralysis for new users by converting dense scholarship into structured micro-learning.

< $7
Cost per Book
30–90m
End-to-End
1. Book Classification

Gemini 2.5 Flash ingests the entire raw PDF (utilizing the 1M context window) to categorize source tiers, define tags, and establish the structural chunking strategy.

2. Adversarial Generation

Claude Sonnet 4.5 generates course sections, which are immediately validated by an adversarial Sonnet 4 prompt. Sections scoring < 4/5 trigger targeted retroactive repairs.

3. Data Structuring

Deterministic regex strips inconsistent culturally specific titles (saving tokens), while Claude dynamically merges and reorganizes the content into a strict 4-table relational schema.

4. Multi-Agent Translation

Cost arbitrage routing: Haiku performs bulk translation, Sonnet 4 validates 4 dimensions of quality, and Sonnet 4.5 repairs only the specific rows that fail exact terminology standards.

Under the Hood

Orchestration Logic

Node-level execution mapped directly from the n8n orchestration layers. Highlights adversarial quality gates, deterministic fallbacks, and cost-optimized model routing.

Pipeline 1: Micro-Learning Generation (v6.2)

111 Nodes · 4 Sequential Event-Linked Pipelines
Stage 1: Content Generation
Section Iteration

Iterates through chunked PDF sections. Passes raw text into the generation prompt.

Claude Sonnet 4.5

Generates structured lesson parts and multiple-choice quiz questions based on the source text.

Stage 2: Adversarial Validation
Fidelity Validator (Sonnet 4)

An independent model critiques the generated content against the source PDF. Separation of models prevents shared blind spots. Prompt Caching to cut input costs.

Repair Routing Logic
  • IF score = 4 or 5 → PASS TO NEXT STAGE
  • IF score < 4 → TRIGGER REPAIR PROMPT

Educational content cannot “plausibly sound right”. ~20% of sections require automated repair.

Stage 3: Deterministic Polish
Structural Auto-Fix

Regex nodes strip culturally specific titles and honorifics to standardize academic tone. Decision: Done via code rather than LLM to save tokens and eliminate latency.

Stage 4: Relational Output
Schema Generation

Outputs 4 distinct CSV files directly mirroring the Supabase relational schema (Units → Lessons → Lesson Parts → Questions).

HITL Buffer

Writes to Google Drive/Sheets. Mandatory 20-40 min human review gate prior to production database import.

Pipeline 2: Multi-Agent Translation (v1.4)

Cost Optimization · EN to ES (LATAM)
Cost Arbitrage Architecture

Producing culturally resonant, terminologically accurate LATAM Spanish requires extreme precision, but passing 400 rows to Sonnet 4.5 costs ~$0.50+/book.

Solution: Use cheaper models for bulk work, and reserve expensive models purely for targeted repairs. Drops cost to $0.14/book.

Stage 1: Bulk Translation
Claude 3 Haiku

High-speed, low-cost baseline translation of English CSV rows. Handles 80-90% of standard instructional text perfectly.

Stage 2: 4-Dimensional Gate
Sonnet 4 Quality Gate

Scores translation from 1-5 across four independent dimensions: Tone, Accuracy, Cultural Resonance, and Theological Terminology.

Strict Repair Trigger

IF ANY_DIMENSION < 3 → REPAIR

Product Decision: We do not average the scores. Averaging a 5/5 for grammar with a 1/5 for theology hides critical terminology failures.

Stage 3: Precision Repair
Claude Sonnet 4.5

Triggers only for the 10-20% of rows that fail the gate. Instructed specifically on which dimension failed to execute a highly targeted structural repair without rewriting the entire row.