System Architecture
Visual breakdown of the technical infrastructure powering the Casa Islamica ecosystem, bridging complex data engineering with end-user applications.
Conversational AI Engine (v3.1)
Multi-step Retrieval-Augmented Generation (RAG) pipeline with semantic routing, primary source precision fallback, and Cohere reranking.
- ◆ Receive user query via Webhook & initiate session management.
- ◆ LLM Query Rewriter translates and expands search terms.
- ◆ Regex-based exact-match detector triggers specialized routing.
- ◆ Apply dynamic Dimension Pre-Filters (e.g., Legal framework, Theology, Philosophy).
- ◆ Parallel search via Pinecone Vector Store & explicit primary manuscript lookup.
- ◆ Cohere Reranker optimizes top K results for context precision.
- ◆ Format retrieved context applying tier-based source hierarchy.
- ◆ Inject LangChain memory buffer for multi-turn conversation context.
- ◆ GPT-4o generates response adhering to strict subject-matter guardrails.
- ◆ Return payload to user interface.
- ◆ Log retrieval metrics (F1, Precision) to Google Sheets.
- ◆ Push interaction data to Supabase and trigger Slack notifications.
Automated Corpus Classification (v3)
Batch processing system to ingest unstructured PDFs, extract text, and apply LLM-driven metadata tagging for vectorization.
Iterate through raw PDFs in Google Drive. Download files and execute OCR/text extraction to prepare raw string payloads.
Gemini 2.5 Flash analyzes full texts to determine chunking strategy, source tier, category tags, and structural regex patterns.
Parse JSON output, format into standardized schema, append structured metadata to central repository, and send completion alerts.
Observability & Evaluation System
A 4-layer architecture built for accuracy-critical AI. Separates transactional chat infrastructure from analytical telemetry, converting human-expert reviews into an RLHF training dataset.
N8N writes the complete retrieval context (18+ fields including rerank scores and snippets) to an OLAP table, enabling root-cause diagnosis of generation vs. retrieval failures.
React frontend captures structured feedback. Hard distinction between 'unhelpful' (UX failure) and 'error' (theological misinformation) routes high-severity issues to a priority queue.
Admin panel with isolated per-admin scoring. Utilizes a 4-category rubric where a ternary 'Correctness' score automatically gates the final verdict, preventing contradictory training labels.
Aggregates reviewed logs into a 30-field dataset. Preserves retrieval metadata, annotator identity, and full rubric breakdowns to train future reward models.
Orchestration Logic
Node-level execution mapped directly from the n8n orchestration layers. Organized by architectural stage to highlight decision gates, constraint management, and data mutation.