Content
OpenAI RAG Pipelines
Build retrieval-augmented generation systems that ground LLM responses in your own data using embeddings, vector search, and context window management.
When to use
- The model needs access to proprietary data not in its training set
- Building a Q&A system over documentation, codebases, or knowledge bases
- Need grounded, citation-backed answers instead of potential hallucinations
- Data changes frequently and re-training is impractical
- Implementing semantic search over large document collections
When NOT to use
- The answer is in the model's training data and doesn't need grounding
- You have fewer than 50 documents — just put them in the context window
- Real-time data is needed (stock prices, live APIs) — use tool calling instead
- The query is transactional, not informational (CRUD operations)
- Exact keyword match is sufficient — use a traditional search engine
Core concepts
RAG pipeline architecture
┌────────────────────────────────────────────────────┐
│ Ingestion Pipeline │
│ │
│ Documents → Chunking → Embedding → Vector Store │
│ (PDF, (split (OpenAI/OSS) (Pinecone, │
│ MD, into embedding Supabase, │
│ HTML) chunks) providers) pgvector) │
└────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────┐
│ Query Pipeline │
│ │
│ User Query → Embed → Vector Search → Rerank → │
│ │
│ → Build Prompt (query + retrieved chunks) → │
│ │
│ → LLM Generation (Responses API / Chat API) → │
│ │
│ → Response with citations │
└────────────────────────────────────────────────────┘