Towering mountain peaks piercing through clouds
Agent Guides

Implementing RAG: How to Make Your AI Agents Actually Accurate

DomAIn Labs Team
January 21, 2025
12 min read

Implementing RAG: How to Make Your AI Agents Actually Accurate

You've built an AI agent. It sounds intelligent, responds quickly, and seems helpful. Then you realize: it's making things up.

This is the #1 problem with AI agents—they hallucinate. They confidently provide incorrect information. They reference policies that don't exist. They give outdated prices.

The solution? Retrieval-Augmented Generation (RAG).

This technical guide explains what RAG is, why it matters, and how to implement it to make your agents actually reliable.

What is RAG?

Simple explanation: Instead of relying on the AI's training data (which can be outdated or incomplete), RAG lets your agent search your actual business documents before answering questions.

How it works:

  1. User asks a question
  2. Agent searches your knowledge base for relevant information
  3. Agent uses that retrieved information to construct an accurate answer
  4. Agent cites sources so you can verify

Why this matters: Your agent answers based on YOUR data, not generic internet information from 2021.

The Problem RAG Solves

Without RAG (Pure LLM)

User: "What's your return policy?"

Agent: "Based on standard retail practices, most companies offer 30-day returns with receipt."

Problem: Your actual policy is 60 days, no receipt needed. Agent just made something up.

With RAG

User: "What's your return policy?"

Agent searches → Finds your returns.pdf document

Agent: "According to our return policy (returns.pdf, updated Jan 2025), we offer 60-day returns with or without a receipt. Items must be unused and in original packaging."

Result: Accurate, verifiable, current information.

RAG Architecture Overview

Here's the basic flow:

// Simplified RAG pipeline
async function answerWithRAG(question: string) {
  // Step 1: Convert question to vector embedding
  const questionEmbedding = await embedText(question)

  // Step 2: Search vector database for relevant documents
  const relevantDocs = await vectorDB.similaritySearch(
    questionEmbedding,
    topK: 5
  )

  // Step 3: Construct prompt with retrieved context
  const prompt = `
    Context from our knowledge base:
    ${relevantDocs.map(doc => doc.content).join('\n\n')}

    User question: ${question}

    Answer the question using ONLY the provided context.
    If the context doesn't contain the answer, say so.
  `

  // Step 4: Generate answer using LLM
  const answer = await llm.generate(prompt)

  // Step 5: Return answer with sources
  return {
    answer,
    sources: relevantDocs.map(doc => doc.metadata)
  }
}

The Components You Need

1. Document Processing

First, you need to convert your business documents into searchable chunks.

What to include:

  • Product documentation
  • FAQs
  • Policies (returns, shipping, privacy)
  • Training manuals
  • Knowledge base articles
  • Historical support tickets (anonymized)

Code example:

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'

async function processDocument(filePath: string) {
  // Read the document
  const rawText = await fs.readFile(filePath, 'utf-8')

  // Split into chunks (important for retrieval quality)
  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,        // Characters per chunk
    chunkOverlap: 200,      // Overlap to maintain context
  })

  const chunks = await splitter.createDocuments([rawText])

  // Add metadata for each chunk
  const processedChunks = chunks.map((chunk, index) => ({
    content: chunk.pageContent,
    metadata: {
      source: filePath,
      chunkIndex: index,
      timestamp: new Date().toISOString()
    }
  }))

  return processedChunks
}

Why chunk size matters:

  • Too small: Fragments lack context
  • Too large: Retrieval becomes imprecise
  • Sweet spot: 500-1500 characters for most use cases

2. Vector Embeddings

Convert text into numbers that capture semantic meaning.

What are embeddings? Think of them as coordinates in "meaning space." Similar concepts are close together.

Example:

  • "return policy" and "refund guidelines" would be close in embedding space
  • "return policy" and "shipping rates" would be far apart

Implementation:

import { OpenAIEmbeddings } from 'langchain/embeddings/openai'

const embeddings = new OpenAIEmbeddings({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'text-embedding-3-small'  // Cost-effective, good quality
})

async function embedDocument(text: string) {
  const vector = await embeddings.embedQuery(text)
  // Returns array of ~1536 numbers representing the text's meaning
  return vector
}

Embedding model options:

  • OpenAI text-embedding-3-small: $0.02 per 1M tokens, good quality
  • OpenAI text-embedding-3-large: $0.13 per 1M tokens, better quality
  • Open-source models: Free but require self-hosting

3. Vector Database

Store embeddings and enable fast similarity search.

Popular options:

DatabaseBest ForPrice
PineconeProduction scale, managed$70/month+
Supabase (pgvector)Cost-effective, full-stackFree tier, $25/month+
ChromaLocal development, simpleFree (self-hosted)
WeaviateAdvanced features, self-hostedFree (self-hosted)

Supabase example (recommended for most):

import { SupabaseVectorStore } from 'langchain/vectorstores/supabase'
import { createClient } from '@supabase/supabase-js'

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
)

// Initialize vector store
const vectorStore = new SupabaseVectorStore(embeddings, {
  client: supabase,
  tableName: 'documents',
  queryName: 'match_documents'
})

// Add documents
await vectorStore.addDocuments(processedChunks)

// Search for relevant docs
const results = await vectorStore.similaritySearch(
  'What is your return policy?',
  5  // Return top 5 most relevant chunks
)

4. Retrieval Logic

How to find the right documents for a given question.

Basic similarity search:

async function retrieveContext(question: string, topK = 5) {
  // Search vector database
  const results = await vectorStore.similaritySearch(question, topK)

  // Extract just the content
  const context = results.map(doc => doc.pageContent).join('\n\n')

  return {
    context,
    sources: results.map(doc => doc.metadata)
  }
}

Advanced: Hybrid search (better results):

async function hybridSearch(question: string) {
  // Combine semantic search with keyword search

  // 1. Semantic search (vector similarity)
  const semanticResults = await vectorStore.similaritySearch(question, 3)

  // 2. Keyword search (BM25 or full-text search)
  const keywordResults = await fullTextSearch(question, 2)

  // 3. Combine and deduplicate
  const combined = [...semanticResults, ...keywordResults]
  const unique = deduplicateBySource(combined)

  return unique.slice(0, 5)  // Top 5 overall
}

5. Prompt Engineering

How you structure the prompt determines answer quality.

Good prompt structure:

function buildRAGPrompt(question: string, context: string) {
  return `
You are a helpful customer service agent for [Company Name].

CONTEXT:
The following information is from our official documentation:

${context}

USER QUESTION:
${question}

INSTRUCTIONS:
1. Answer ONLY using information from the context above
2. If the context doesn't contain the answer, say: "I don't have that information in my knowledge base. Let me connect you with a team member who can help."
3. Cite which document you're referencing (e.g., "According to our Return Policy...")
4. Be concise but complete
5. If there are multiple relevant policies, mention all of them

ANSWER:
`
}

Why this works:

  • Clear role definition
  • Explicit instructions to avoid hallucination
  • Citation requirement for verification
  • Graceful handling of unknowns

Complete RAG Implementation

Here's a production-ready example:

import { ChatOpenAI } from 'langchain/chat_models/openai'
import { SupabaseVectorStore } from 'langchain/vectorstores/supabase'
import { OpenAIEmbeddings } from 'langchain/embeddings/openai'

class RAGAgent {
  private llm: ChatOpenAI
  private vectorStore: SupabaseVectorStore
  private embeddings: OpenAIEmbeddings

  constructor() {
    this.embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY,
      modelName: 'text-embedding-3-small'
    })

    this.llm = new ChatOpenAI({
      modelName: 'gpt-4-turbo-preview',
      temperature: 0.1,  // Lower = more deterministic
    })

    this.vectorStore = new SupabaseVectorStore(this.embeddings, {
      client: supabase,
      tableName: 'documents',
      queryName: 'match_documents'
    })
  }

  async answer(question: string) {
    try {
      // 1. Retrieve relevant context
      const results = await this.vectorStore.similaritySearch(question, 5)

      if (results.length === 0) {
        return {
          answer: "I don't have enough information to answer that question reliably. Let me connect you with a team member.",
          sources: [],
          confidence: 'low'
        }
      }

      // 2. Build context string
      const context = results
        .map((doc, i) => `[Source ${i+1}]: ${doc.pageContent}`)
        .join('\n\n')

      // 3. Generate answer with LLM
      const prompt = this.buildPrompt(question, context)
      const response = await this.llm.invoke(prompt)

      // 4. Return answer with metadata
      return {
        answer: response.content,
        sources: results.map(doc => ({
          file: doc.metadata.source,
          chunk: doc.metadata.chunkIndex
        })),
        confidence: this.calculateConfidence(results)
      }

    } catch (error) {
      console.error('RAG error:', error)
      return {
        answer: "I encountered an error. Please try again or contact support.",
        sources: [],
        confidence: 'error'
      }
    }
  }

  private buildPrompt(question: string, context: string): string {
    return `
Context from company documentation:
${context}

Question: ${question}

Provide an accurate answer using ONLY the context above. Cite your sources.
If you cannot answer from the context, say so clearly.
    `.trim()
  }

  private calculateConfidence(results: any[]): 'high' | 'medium' | 'low' {
    // Check similarity scores
    const avgScore = results.reduce((sum, r) => sum + (r.score || 0), 0) / results.length

    if (avgScore > 0.8) return 'high'
    if (avgScore > 0.6) return 'medium'
    return 'low'
  }
}

// Usage
const agent = new RAGAgent()
const response = await agent.answer("What's your return policy?")

console.log(response.answer)
console.log('Sources:', response.sources)
console.log('Confidence:', response.confidence)

Common RAG Challenges & Solutions

Challenge 1: Poor Retrieval Quality

Problem: Agent retrieves irrelevant documents.

Solutions:

  • Improve chunk size (experiment with 500-1500 characters)
  • Use hybrid search (semantic + keyword)
  • Add metadata filters (date, document type, department)
  • Increase topK parameter (retrieve more candidates)

Challenge 2: Contradictory Information

Problem: Different documents say different things.

Solutions:

  • Prioritize by document freshness (timestamp metadata)
  • Add document authority levels (official policy > blog post)
  • Include version control in metadata
  • Let agent acknowledge conflicts: "I found two different policies..."

Challenge 3: Cost Control

Problem: Embedding and LLM calls get expensive at scale.

Solutions:

  • Cache embeddings (don't re-embed same text)
  • Use cheaper embedding models for development
  • Implement semantic caching (cache similar questions)
  • Batch process documents during off-hours
// Semantic caching example
const cache = new Map<string, any>()

async function answerWithCache(question: string) {
  // Check if we've answered similar question recently
  const cachedAnswer = await semanticCacheCheck(question, cache)

  if (cachedAnswer && cachedAnswer.similarity > 0.95) {
    return cachedAnswer.response  // Return cached answer
  }

  // Generate fresh answer
  const response = await agent.answer(question)

  // Cache for future
  cache.set(question, { response, timestamp: Date.now() })

  return response
}

Challenge 4: Slow Response Times

Problem: Users wait 5-10 seconds for answers.

Solutions:

  • Use faster embedding models
  • Reduce topK (fewer documents to process)
  • Implement streaming responses (show answer as it generates)
  • Pre-compute embeddings for FAQs
  • Use edge functions for retrieval

Measuring RAG Performance

Track these metrics:

interface RAGMetrics {
  // Retrieval quality
  retrievalPrecision: number    // % of retrieved docs that are relevant
  retrievalRecall: number       // % of relevant docs that are retrieved

  // Answer quality
  answerAccuracy: number        // % of answers that are correct
  citationRate: number          // % of answers with source citations

  // Performance
  avgRetrievalTime: number      // ms to find documents
  avgGenerationTime: number     // ms to generate answer
  avgTotalTime: number          // ms end-to-end

  // Cost
  embeddingCost: number         // $ per 1K queries
  llmCost: number               // $ per 1K queries
}

How to improve:

  • Low precision: Tighten similarity threshold
  • Low recall: Increase chunk overlap, better chunking strategy
  • Slow retrieval: Index optimization, use faster vector DB
  • High cost: Caching, cheaper models, batch processing

Production Checklist

Before deploying RAG to production:

  • Document ingestion pipeline automated
  • Vector database backed up regularly
  • Monitoring and logging in place
  • Fallback for when retrieval fails
  • Cost tracking per query
  • A/B test RAG vs non-RAG responses
  • Human review sample of answers weekly
  • Update cycle for document changes

The Bottom Line

RAG transforms AI agents from "sounds smart but unreliable" to "actually accurate and trustworthy."

Investment required:

  • Development: 2-4 weeks for basic implementation
  • Cost: $50-300/month for vector database + embeddings
  • Maintenance: 4-8 hours/month updating documents

Returns:

  • Accuracy: 60-70% → 90-95%+ with good RAG
  • Trust: Customers can verify sources
  • Flexibility: Update knowledge without retraining models
  • Scalability: Handle company-specific questions

When to implement RAG:

  • ✅ Your agent needs to reference company-specific information
  • ✅ Information changes frequently (policies, prices, products)
  • ✅ Accuracy is critical (support, compliance, sales)
  • ✅ You have documented knowledge to work with

When you might not need RAG:

  • ❌ Agent only does simple, generic tasks
  • ❌ No business-specific information needed
  • ❌ Budget under $50/month total

Next Steps

  1. Assess your documents: What knowledge does your agent need?
  2. Choose a vector database: Supabase for most, Pinecone for scale
  3. Start small: Implement RAG for one category (e.g., FAQs)
  4. Measure and iterate: Track accuracy, adjust chunk size and topK
  5. Scale gradually: Add more document types as you refine

Need help implementing RAG for your agent? Schedule a consultation to discuss your specific use case, or explore our Agent Use Case Explorer to see RAG in action.

Remember: RAG isn't optional for production agents—it's essential. The difference between an agent that "sounds good" and one that "actually works" is almost always RAG.

Tags:RAGvector databasesaccuracyimplementationtechnical

About the Author

DomAIn Labs Team

The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.