AI-Powered EdTech · B2C · 0-to-1 Build

Casa Islamica: AI-Powered Learning Platform

Engineered an automated course creation pipeline and a source-verified RAG chatbot—backed by a custom observability framework—to scale structured education for Latin American converts.

Check it out

Context & Role

The Problem Space

The Latin American Muslim convert community faces a massive 60% abandonment rate within the first 90 days, driven by information overwhelm and cultural friction.

The Users

Spanish and Portuguese-speaking converts who need deeply nuanced, culturally contextualized theological education on mobile devices.

Business Context

A dawah-focused (non-profit) initiative built with a $0 operational budget.

My Role

Solo builder. First exploration building hands-on AI product from sratch. I handled product strategy, full-stack frontend development, and designed the complete n8n AI orchestration and telemetry databases. I lead market research and user studies with surveys and well structured AI prompts. I created the necessary product artefacts like the product vision, PRD, AI product strategy and framework, success metrics and north star, etc.

Problem & Hypothesis

User Pain

Converts face crippling "where-to-start paralysis" and misinformation anxiety. General-purpose AI hallucinates religious rulings, and YouTube lacks structure, leaving users to navigate a massive trust gap.

Product Pain

Manual course creation is too time-consuming, expensive ($0 budget) and slow, and conversational AI is notoriously difficult to measure for accuracy in sensitive domains.

Hypothesis

We believed that pairing a structured micro-learning frontend with a fully automated course generation pipeline and a source-verified RAG chatbot (monitored by deep telemetry), as well as some clear how-to and guidelines information, would reduce overwhelm, build trust, and significantly improve 30-day retention.

My Approach & Process

User Research & Problem Definition

◆ Research revealed that churn was not driven by a lack of interest, but by intense misinformation anxiety and cultural friction.
◆ We scoped the MVP specifically around these retention levers, focusing exclusively on solving the "trust gap" and "where-to-start" paralysis for new users.

AI Decision Frameworks & Trade-offs

◆ I explicitly prioritized a Source-Verified RAG architecture over fine-tuning to guarantee source attribution. I configured the AI to refuse out-of-scope questions, preferring an accurate "I don't know" over an improvised guess.
◆ Operating with a $0 budget required treating AI spend like any infrastructure cost. I designed multi-model pipelines that map the cheapest capable model to each specific cognitive task, paired with prompt caching, reducing classification generation costs by up to 95% without sacrificing fidelity.

Goal Setting & Feature Validation

◆ I drove growth via our North Star (Total Learning Interactions), but strictly paired it with counter-metrics like Scholar Accuracy and Lesson Completion.
◆ I establish hard analytical kill-switches for all secondary features. If a community or support feature fails to drive a predefined click-through rate to educational content, it is actively deprioritized.

Scalable Collaboration & Stakeholder Management

◆ I engaged scholars and designed a collaboration model where AI pipelines handle 100% of the content generation and extraction, while scholars act purely as a scalable Quality Assurance layer.
◆ Instead of reviewing every interaction, domain experts review a structured monthly sample of AI responses against a strict 4-dimensional evaluation rubric (Correctness, Completeness, Clarity, Efficiency).

Solution & AI Design

The Frontend (The Delivery)

A mobile-first React/Vite PWA optimized for Latin American device constraints. It drives habit formation via gamification (XP, streaks) and delivers bite-sized, culturally resonant micro-lessons tailored to the user’s lifecycle stage.

The Course Creation Pipeline (The Engine)

A fully automated n8n workflow that ingests PDFs and outputs bilingual LMS CSVs. I mapped specific models to tasks to balance cost and capability: Google Gemini 2.5 Flash for full-book classification (leveraging its 1M context), Claude Haiku 4.5 for high-volume structured generation, and Claude Sonnet 4.5 strictly for independent fidelity validation.

Explore the course creation architecture

The Chatbot & Observability Platform (The Safety Net)

A RAG architecture strictly grounded in curated Islamic texts with semantic embedding and reranker. I enforced a deliberate OLTP/OLAP database separation: user-facing chat lives in one table, while deep RAG telemetry (retrieved chunks, embedding dimensions, search times) streams asynchronously to an analytics warehouse to enable safe RLHF tracking and admin evals. I also connected this all to Amplitude, Fivetran and Databricks.

Explore the system architecture

Delivery & Experimentation

Self-Healing Generation Loop

AI frequently hallucinates citations. I built a two-pass "fidelity gate" where Claude Sonnet checks the generated lesson against the source text. If it spots fabricated quotes, it triggers Claude Haiku to execute a targeted field repair (costing ~$0.001 per repair) rather than reprocessing the whole book.

Fixing Context Truncation

Early on, GPT-4o misclassified books because its 128K context window required sampling only 75% of the text. I migrated to Gemini 2.5 Flash to pass 100% of the book, which eliminated the failure mode and cut classification costs by 22x (from $0.22 to $0.01 per book).

Custom Admin Evals & RLHF

I built an admin interface where domain experts score chatbot responses across 4 dimensions (Correctness, Completeness, Clarity, Efficiency). To build a high-quality RLHF dataset, I tracked revision_count on user feedback to isolate high-confidence labels for future reward model training.

Cost Observability

Because variable API token usage is a silent budget killer, I implemented a _tkn logging pattern across all n8n nodes. This pushes token counts and specific model pricing directly to Google Sheets per run, enforcing strict cost discipline.

Outcomes

20+ books

processed end-to-end, generating thousands of lessons without manual editing

Under $5

total cost per book for the full automated pipeline

99%+

scholar accuracy rating maintained by the source-verified RAG system

22x

reduction in classification costs by migrating to Gemini 2.5 Flash

95%

reduction in generation costs via multi-model pipeline optimization

Key Learnings

Analytics schemas evolve faster than product schemas. Separating the OLTP chat tables from the OLAP RAG logs was the best architectural decision. It allowed rapid experimentation with chunk sizes and search parameters without risking the stability of the user’s chat history UI.

Feedback is a training asset, not a support ticket. Building the revision_count tracker taught me how messy human feedback is. Identifying un-revised, high-confidence labels completely changes the quality of your RLHF dataset.

AI is a product, not magic. You have to build in-loop validation, automated structural repairs, and explicit OLTP/OLAP database separation to handle user feedback (RLHF) if you want the product to actually scale and improve over time.

Deep understanding of the workflow helps spot silent killers. If you don’t fully understand the workflow and carefully monitor each output, the AI will silently pass hallucinated content.