Back to case study

System Architecture

Visual breakdown of the technical infrastructure powering the Casa Islamica ecosystem, bridging complex data engineering with end-user applications.

Production System

Conversational AI Engine (v3.1)

Multi-step Retrieval-Augmented Generation (RAG) pipeline with semantic routing, primary source precision fallback, and Cohere reranking.

1. Ingestion & Routing
  • Receive user query via Webhook & initiate session management.
  • LLM Query Rewriter translates and expands search terms.
  • Regex-based exact-match detector triggers specialized routing.
2. Vector Retrieval
  • Apply dynamic Dimension Pre-Filters (e.g., Legal framework, Theology, Philosophy).
  • Parallel search via Pinecone Vector Store & explicit primary manuscript lookup.
  • Cohere Reranker optimizes top K results for context precision.
3. Synthesis
  • Format retrieved context applying tier-based source hierarchy.
  • Inject LangChain memory buffer for multi-turn conversation context.
  • GPT-4o generates response adhering to strict subject-matter guardrails.
4. Telemetry & Output
  • Return payload to user interface.
  • Log retrieval metrics (F1, Precision) to Google Sheets.
  • Push interaction data to Supabase and trigger Slack notifications.
Data Engineering Pipeline

Automated Corpus Classification (v3)

Batch processing system to ingest unstructured PDFs, extract text, and apply LLM-driven metadata tagging for vectorization.

1. Extraction

Iterate through raw PDFs in Google Drive. Download files and execute OCR/text extraction to prepare raw string payloads.

2. AI Analysis

Gemini 2.5 Flash analyzes full texts to determine chunking strategy, source tier, category tags, and structural regex patterns.

3. Structured Formatting

Parse JSON output, format into standardized schema, append structured metadata to central repository, and send completion alerts.

Quality Assurance & ML Pipeline

Observability & Evaluation System

A 4-layer architecture built for accuracy-critical AI. Separates transactional chat infrastructure from analytical telemetry, converting human-expert reviews into an RLHF training dataset.

1. Automated Telemetry

N8N writes the complete retrieval context (18+ fields including rerank scores and snippets) to an OLAP table, enabling root-cause diagnosis of generation vs. retrieval failures.

2. User Signal

React frontend captures structured feedback. Hard distinction between 'unhelpful' (UX failure) and 'error' (theological misinformation) routes high-severity issues to a priority queue.

3. Expert Review

Admin panel with isolated per-admin scoring. Utilizes a 4-category rubric where a ternary 'Correctness' score automatically gates the final verdict, preventing contradictory training labels.

4. RLHF Export

Aggregates reviewed logs into a 30-field dataset. Preserves retrieval metadata, annotator identity, and full rubric breakdowns to train future reward models.

Under the Hood

Orchestration Logic

Node-level execution mapped directly from the n8n orchestration layers. Organized by architectural stage to highlight decision gates, constraint management, and data mutation.

Pipeline 1: RAG Chatbot (v3.1)

33 Nodes · Sync + Async
Stage 1: Preprocessing
Webhook Trigger

Receives Spanish query and session UUID. Decouples N8N from frontend edge functions.

Query Rewriter (GPT-4o-mini)

Translates Spanish to English and semantically expands query to improve vector recall.

Decision: Used GPT-4o-mini over GPT-4o here. Translation is a low-stakes task; saves cost and latency before the heavy retrieval.
Stage 2: Retrieval
Dimension Filter & Router

Extracts topic tags (e.g., Legal rulings, Theology) to constrain search space. Detects explicit primary text requests to route to a dedicated authoritative index branch.

Pinecone Vector Search

Queries 64,800 vectors. Fetches Top-K=30. Strictly limited to Tier 1 (Practical) and Tier 2 (Commentary) sources.

Stage 3: Reranking
Cohere Rerank API v2

Direct HTTP request bypassing SDK wrapper for full parameter control. Uses cross-encoder to rescore Top 30 down to Top 7. Discards scores below 0.3.

Post-Rerank Processing

Applies Tier 1 boost to prioritize practical guides. Caps source diversity (max 2 chunks per book). Merges adjacent chunks to restore context.

Stage 4: Generation
Context Assembly

Formats chunks by tier. Injects Supabase-backed persistent memory buffer for multi-turn conversation context.

GPT-4o Synthesis

Reads English sources to generate localized output. Enforces strict domain-specific guardrails and inline citations. Instructed to admit uncertainty if sources lack context.

Stage 5: Output & Observability
Parallel Execution

Synchronously returns JSON to Webhook for immediate UI rendering while asynchronously logging telemetry.

Metrics Tracked
Reranker F1 & Precision E2E Latency (ms) Source Tier Spread Context Diversity
Data Separation: Writes session history to Supabase (OLAP analytical aggregations) and Google Sheets (immediate human QA), keeping analytical queries off the production database.

Pipeline 2: Corpus Batch Classification (v3)

10 Nodes · Batch Processing
Stage 1: Batch Control
Drive Discovery

Scans ingestion folder for raw PDFs. Feeds file array to iteration controller.

Loop Controller

Processes exactly one book at a time to prevent API rate limit exhaustion and memory overflow.

Stage 2: Ingestion
Binary Extraction

Downloads PDF and extracts raw string payload. Low fidelity text extraction is acceptable here as the LLM only needs thematic understanding.

Stage 3: AI Analysis
Gemini 2.5 Flash

Leverages 1M token context window to ingest entire book. Classifies source tier, generates 22 distinct JSON fields, and determines vector chunking strategy (e.g., thematic vs. episodic).

Stage 4: Data Handoff
Append to Sheets

Writes parsed output to Google Sheets acting as a mandatory human review gate before vectors are generated.

Rate Limit Buffer

Enforces a strict 10-second wait before looping to the next book to protect Gemini API quotas.

Pipeline 3: Observability & Evals Architecture

4 Layers · Parallel DB Writes
Layer 1: Telemetry (N8N)
chatbot_queries_logs

Captures full retrieval context—not just the answer. Enables diagnosing whether a failure stems from poor chunk retrieval or LLM generation.

OLAP Table
Layer 2: User Signal (React)
chatbot_message_feedback

Structured feedback paths. Severity distinction enforced: 'error' ≠ 'unhelpful'. Errors map to potential theological misinformation requiring immediate review.

OLTP Table
Intentional Schema Separation
Layer 3: Admin Review
Ternary Rubric Engine

Scores across 4 dimensions. Correctness allows partial states (0, 0.5, 1) because theological accuracy has meaningful gradations. Clarity/Efficiency are binary.

Auto-Computed Verdict
  • IF correctness = 0 → BAD
  • IF correctness = 0.5 OR completeness = 0 → BAD
  • IF correctness = 1 AND completeness = 1 → GOOD

Computed, not manually set—prevents rubric/verdict contradiction in training data.

Layer 4: Super Admin Export
RLHF-Ready Dataset

Bulk CSV/XLSX export merging retrieval metadata, annotator identity, user ratings, and full rubric breakdowns.

Why 30 fields?
Reward models need richer signals than just “good/bad”. Exporting the full context enables training models to improve retrieval and generation simultaneously.