What We Built — Lamblight
Lamblight is a scripture-based AI journaling application. The concept is simple: users write a journal entry, and the app generates a reflective prompt grounded in relevant scripture passages, helping them connect their daily experience to their faith in a way that feels personal rather than generic. The category — spiritual wellness technology — had a real audience, a clear unmet need (existing journaling apps offered no AI-driven reflection layer), and a founder with deep domain knowledge. It was a strong product-market fit hypothesis waiting to be tested with an actual product.
Lamblight came to Automely as a client brief. By the time we delivered, it had cost $95,000 to build, reached 20,000+ users, and was generating $312,000 in ARR. This case study documents the architecture decisions, development process, technical challenges, and lessons from building a production AI SaaS product — not a prototype, not a demo, but a product real users pay for and return to daily.
What AI-First Development Actually Means
There is a meaningful distinction between “SaaS with AI features” and “AI-native SaaS.” In the first, AI is a bolt-on — a chatbot in the corner, a summarisation button, maybe some autocomplete added after the product was already designed. In AI-native SaaS, AI is the product. The entire user experience, data pipeline, and business model are designed around AI capabilities from day one. Lamblight is the second kind. The journaling engine is not a feature of the app. It is the reason the app exists.
AI-first development at Automely specifically means the following. First, Amir and the engineering team use AI tools to generate 70-80% of implementation code — boilerplate, repetitive integration code, test suites, and infrastructure configuration. This is not the same as vibe coding; the senior engineers review, refactor, and govern every output. But the velocity is dramatically higher than traditional development. Second, the AI capability — in Lamblight's case, the journaling engine and RAG pipeline — is architected and built before the product shell. The product wraps around the AI, not the reverse. Third, the evaluation framework is in place before launch. We do not ship without knowing what “good” looks like for the AI output and having automated tests to catch regressions when models update.
Lushbinary's 2026 guide states it precisely: “The biggest mistake founders make: building AI-native SaaS with traditional SaaS architecture. You’ll hit cost, latency, and quality walls within months. Design for AI from the start.” This is the reason AI-first development is not just a delivery process — it is an architectural stance. The decisions made in the first two weeks of an AI SaaS build determine whether you have a sustainable product or an expensive prototype that breaks when usage scales.
6 Architecture Decisions That Determined the Outcome
Every AI SaaS product involves a set of foundational decisions that are difficult to reverse once the build is underway. These are the six decisions that shaped Lamblight's architecture — and the reasoning behind each one.
Adjust GPT-4 weights on a scripture corpus. More “native” feeling. Harder to update, expensive to retrain, difficult to debug when retrieval fails. Months of training pipeline work before first user interaction.
Index scripture passages and theological commentary into pgvector. At query time, retrieve the most contextually relevant passages and inject them into the LLM prompt. Updatable without retraining, fully debuggable, ships in weeks not months.
Simple to implement. Maximum quality on all tasks. At 20K users with daily journaling sessions, AI cost became unsustainable on full GPT-4o pricing for classification and routing tasks that don’t need it.
Tier 1 (fast, cheap): intent classification, tone detection, brief responses → GPT-4o Mini. Tier 2 (balanced): most user-facing journaling prompts → GPT-4o. Tier 3 (capable): complex multi-session reflection, edge cases → GPT-4o Full. Router logic starts simple and routes by request type.
Simpler implementation. Users see nothing for 3-8 seconds while the LLM generates the full reflection prompt, then the complete response appears. Early user testing showed 40%+ of users abandoning before the response appeared.
Tokens delivered to the mobile client as they are generated. Users see the response appearing word by word within 400ms of submission. Same underlying AI latency; dramatically different perceived experience. Abandonment rate dropped to under 5%.
Fully managed, scales automatically, purpose-built for vector operations. Additional service to manage, vendor to pay, and operational dependency. For Lamblight’s scripture corpus size, adds cost and complexity without clear benefit at launch volume.
Extends PostgreSQL — which Lamblight already uses for all user data — with vector embedding operations. Co-located with application data. No additional vendor. Simpler infrastructure. Hybrid search capability (vector + keyword). Sufficient for Lamblight’s corpus scale and query volume.
In RAG retrieval without explicit tenant scoping, a similarity search could theoretically surface embeddings from another user’s journal entries if they are semantically similar. A single missing WHERE clause in the retrieval query can expose another user’s private content through the AI response.
Every RAG retrieval query is scoped to the authenticated user’s ID at the database level. User journal embeddings and personal context are partitioned at the data layer. The scripture corpus is shared (no sensitivity), but personal context is never cross-tenant. Tested explicitly in the evaluation suite.
Simpler subscription model. More compelling marketing. In practice: heaviest 10% of users consume 60%+ of total AI costs (Lushbinary 2026). A flat-rate unlimited model at $9.99/month with high-usage outliers makes unit economics unsustainable at scale.
Free tier: 3 AI journal entries/week. Standard ($9.99/month): 30 AI sessions/month. Premium ($19.99/month): Unlimited AI journaling + additional features. Price-in the AI cost for heavy users at the subscription tier level.
Building an AI SaaS product and want the same six architecture decisions applied to your specific use case, domain, and user base — RAG strategy, model routing, streaming, vector database choice, tenant isolation, and usage tier design?
Free 45-minute consultation. We review your product brief, propose the AI architecture, and give you a realistic build timeline and cost estimate — including the specific decisions above applied to your product.
4 Build Phases — From Brief to Production
Lamblight was delivered across four overlapping build phases — architecture and AI core first, backend and AI engine next, product layer wrapping a proven AI core, and ongoing scale and iteration. The sequencing matters as much as the scope: shipping mobile and web before the AI engine is production-tested is the most common pattern that produces expensive prototypes that break under real usage.
- Product spec written in markdown — machine-readable, directly consumable by AI coding tools (Claude, Cursor). Spec covers: problem and user, core workflow (5 steps), technical architecture with specific model and retrieval strategy decisions
- LLM selection and evaluation framework setup: define what “good” looks like for AI journaling output before writing a line of product code. Evaluation criteria: scripture relevance, theological accuracy, personal resonance, hallucination rate
- Scripture corpus preparation: cleaning, structuring, and semantically chunking theological content. Naive chunking (token limits) produces retrieval degradation under real queries — semantic chunking by concept boundaries preserves meaning
- RAG pipeline proof of concept: verify that retrieval precision is above 90% on test queries before building the product layer on top of it
- Prompt design v1: temperature 0.1 for classification and structured output; 0.7 for reflective journaling generation. “Temperature is the single most consequential configuration decision for production AI reliability” (RankSquire 2026)
- Scripture RAG pipeline production build: pgvector embedding pipeline, semantic retrieval endpoint, tenant-isolated personal context retrieval, response caching for frequently requested passages
- AI journaling engine: the full prompt loop — user journal text → intent classification → scripture retrieval → personal context injection → reflective prompt generation → streaming response delivery
- User authentication and tenant isolation architecture: Supabase Auth, strict per-user data scoping at the database level, AI request attribution for cost monitoring per user
- Subscription tier management: Stripe integration, entitlement enforcement, usage metering for AI session limits per plan tier
- LLM observability setup: Helicone for prompt logging, cost tracking, latency monitoring. “You have to track model performance, inference costs, latency, accuracy metrics, and user satisfaction all at the same time” (Unified AI Hub)
- Evaluation suite implementation: 50 golden examples with expected retrieval and generation outputs. Automated runs against every prompt change. Regression gate: no change to prompt or model version ships without passing the evaluation suite
- iOS and Android apps in React Native: daily journaling is a mobile-first habit. The AI journaling engine was already production-tested before mobile development began — the mobile layer wraps around a proven AI core
- Web app in Next.js 14 App Router: consistent experience across platforms, shared authentication state, progressive feature parity with mobile
- Onboarding flow: the critical moment for a journaling app — the user must write their first journal entry and receive their first AI reflection within the first session. Onboarding designed to reach this moment in under 3 minutes
- Daily habit mechanics: streak tracking, notification system, weekly reflection summaries generated from the user’s journal history
- Admin dashboard: content moderation tools, user analytics, AI quality monitoring dashboards, cost-per-user reporting
- AI cost monitoring per user cohort: identify the top 10% heaviest users, monitor per-session AI cost trends, trigger model tier routing adjustments when usage patterns shift
- Automated evaluation on model updates: when OpenAI or Anthropic releases a model update, the evaluation suite runs automatically. If precision or quality metrics drop below threshold, the update is blocked from production until prompt adjustments restore performance
- User feedback signals: thumbs up/down on AI reflections, regeneration rate (how often users ask for a new response), edit tracking. These signals feed directly into the evaluation suite and prompt refinement process
- Infrastructure scaling: Redis caching layer for frequent passage queries, pgvector index optimisation as corpus and user count grow, CDN for static assets
Inside the AI Journaling Engine
The core AI capability in Lamblight is the journaling engine — the system that takes a user's journal entry and generates a scripture-grounded reflective prompt. Here is how the pipeline works in production:
# 1. User submits journal entry (streamed response begins)
user_entry = "Feeling anxious about the job interview tomorrow..."
# 2. Tier 1 — Intent Classification (GPT-4o Mini, temp=0.1)
intent = classify_intent(user_entry)
# → {"emotion": "anxiety", "theme": "uncertainty", "context": "professional"}
# 3. Scripture Retrieval — pgvector semantic search
passages = retrieve_scripture(
query=build_retrieval_query(intent),
user_id=current_user.id, # Strict tenant scoping
top_k=5,
min_similarity=0.72
)
# → Philippians 4:6-7, Matthew 6:25-27, Isaiah 41:10 (top 3 returned)
# 4. Personal Context Injection (user's recent journal history)
personal_context = retrieve_user_context(
user_id=current_user.id,
lookback_days=14
)
# 5. Prompt Construction
prompt = build_reflection_prompt(
entry=user_entry,
scripture=passages,
context=personal_context,
tone="warm, grounded, non-preachy"
)
# 6. Tier 2 — Journaling Generation (GPT-4o, temp=0.7, streaming)
async for token in stream_reflection(prompt):
yield token # SSE to client — tokens appear in real timeThe scripture retrieval precision in production runs at 94% — meaning 94% of retrieved passages are judged contextually relevant by our evaluation suite. The chunking strategy was critical: we chunk by semantic concept boundaries, not token counts. A naive chunk of Philippians 4:6-7 split at a token limit would separate “be anxious for nothing” from “with prayer and petition” — destroying the meaning of the verse for retrieval purposes.
AI Cost Management at 20,000 Users
The most common surprise for founders building AI SaaS products is the AI cost structure at scale. At 20,000 active users with daily journaling habits, controlling per-user AI cost is not optional — it determines whether the business model works. Here is how Lamblight manages it:
- Tiered model routing holds the cost floor. By routing classification and simple tasks to GPT-4o Mini and reserving GPT-4o Full for complex generation, the average AI cost per journaling session is under $0.02. Without routing, the same session would cost $0.08-$0.12 on full GPT-4o — a 4-6× difference that compounds severely at scale.
- Scripture passage caching eliminates redundant embedding operations. The most frequently retrieved passages (the 200 most common scripture responses) are cached in Redis. When a retrieval query returns a cached passage, the embedding and retrieval cost is zero. At 20K users with significant overlap in common emotional themes, this eliminates roughly 35% of vector retrieval operations.
- Usage tiers align revenue with AI cost. The top 10% of Lamblight users (the heaviest journalers) generate roughly 60% of AI cost. These users are on the Premium tier at $19.99/month, which prices in the higher consumption. Free tier users are limited to 3 sessions per week — sufficient to validate the product but structured to drive upgrade conversion rather than unlimited free AI consumption.
- Context window management keeps prompt costs minimal. Personal context injection is capped at the 5 most relevant recent journal entries. System prompts are compressed and cached. Every additional token in the context window costs money at 20K users per day — we treat prompt length as a financial variable, not just a performance variable.
Average monthly AI cost per active user: under $0.50. Average revenue per user: $9.99-$19.99 depending on tier. AI gross margin on the product: approximately 90%. The unit economics work because of the architectural decisions made in Week 1 — tiered routing, usage tiers, and caching — not because of optimisations applied after the fact. AI cost management is an architecture decision, not a finance decision.
Results — From $95K Build to $312K ARR
Lamblight's commercial outcomes reflect what is achievable when the AI-first development process is applied correctly to a product with genuine product-market fit. The build cost of $95,000 covered the full AI journaling engine, RAG pipeline, iOS app, Android app, web app, subscription management, admin dashboard, and production infrastructure. The $312,000 ARR represents a 3.3× return on build cost within the first year of operation. User growth to 20,000+ users came primarily through organic channels — the daily habit mechanics (streaks, notifications, weekly reflections) driving retention rates that sustain word-of-mouth growth.
The technical outcomes that enabled these commercial results: scripture retrieval precision maintained at 94% across model updates; average AI response latency under 1.8 seconds end-to-end (with streaming, the perceived latency is under 400ms); monthly AI cost per active user under $0.50; zero cross-tenant data incidents since launch. These are not marketing claims — they are the output of the evaluation suite, cost monitoring dashboard, and production observability stack built in Phase 1 before the first user interaction.
8 Production Lessons From Building Lamblight
Eight lessons recur across the Lamblight build — each one is a decision that, made correctly in Week 1, compounds into sustainable unit economics and product quality at scale; made incorrectly, becomes an expensive retrofit later. They are documented below in the order they most affect the outcome.
Use RAG for Domain-Specific AI Apps — Save Fine-Tuning for Specific, Documented Gaps
RAG is faster to build, easier to debug, and more flexible for corpus updates. Fine-tuning is justified only when RAG cannot solve a specific, documented quality problem. We started with RAG and never found a compelling reason to fine-tune for Lamblight’s use case.
Tiered Model Routing Reduces AI Cost 60-70% — Build This From the Start
Simple classification tasks do not need GPT-4o. Routing by task complexity — not by user segment — is the most impactful single architectural decision for AI SaaS unit economics. Retrofitting this onto a single-model architecture is painful. Build it from Week 1.
Evaluation Before Launch — Not After
The evaluation suite (golden examples, precision metrics, hallucination detection) must exist before launch, not as a post-launch improvement. “This separates a demo from a product.” Without evals, you discover quality failures through user complaints. With evals, you catch them before they reach users.
Heaviest 10% of Users Consume 60%+ of AI Cost — Price Accordingly
Flat-rate unlimited AI is a margin killer at scale. Usage-based subscription tiers are not just a monetisation decision — they are a unit economics survival decision. Design your pricing model around your AI cost structure, not as an afterthought.
Semantic Chunking Over Token-Limit Chunking — Every Time
Naive chunking (character counts, rough paragraphs) degrades RAG retrieval under real-world queries. Semantic chunking — respecting concept boundaries in the source material — maintains meaning across chunks and significantly improves retrieval precision. The difference is measurable in production.
Streaming Is a UX Decision — It Changes Retention, Not Just Performance
Streaming (SSE) did not change the underlying AI latency. It changed the perceived latency from 4-8 seconds to under 400ms. That difference changed user abandonment rate from 40% to under 5% on the first AI interaction. This is a product decision disguised as a technical one.
Temperature Configuration Is Not Trivial — 0.1 for Structure, 0.7 for Generation
Temperature 0.0-0.2 for all agent steps requiring structured output or tool-call schema generation. Temperature 0.5-0.8 for content generation steps. The wrong temperature on structured output steps causes JSON parse failures and classification drift. Treat temperature as a first-class configuration variable, not a default to accept.
Tenant Isolation in RAG Is Not Optional — It Must Be Tested Explicitly
A single missing WHERE clause in a retrieval query can expose one user’s private journal content in another user’s AI response. Tenant isolation must be tested in the evaluation suite specifically — not assumed from the data model. This is the most consequential security oversight in AI SaaS and the easiest one to miss in development.
For the broader context on how AI-first development companies differ from traditional agencies — and how to evaluate any AI development firm before commissioning your product — see our AI development company vs traditional agency guide. For the architecture decision of which AI model approach is right for your product, see our OpenAI API vs custom AI development guide.
Have a SaaS product concept that needs AI-first development — with the architecture decisions above applied to your specific use case, domain, and user base? Automely builds AI SaaS from brief to production.
Free 45-minute product consultation. We review your concept, propose the AI architecture, identify the critical decisions for your use case, and give you a realistic build timeline and cost estimate based on real delivery experience.

