Vector Databases for RAG: Comparison and Setup
What Vector Databases Do
A vector database stores high-dimensional numerical vectors and provides efficient nearest-neighbor search. When a RAG system needs to find documents relevant to a query, it converts the query into a vector and asks the database to find the stored vectors most similar to it. The database returns the closest matches along with their associated text and metadata, which are then passed to the language model as context.
Beyond basic similarity search, production vector databases provide metadata filtering (restricting search to vectors that match specific criteria like date range, source, or category), hybrid search (combining vector similarity with keyword matching), multi-tenancy (isolating data between different users or applications), and scaling mechanisms for growing collections.
Purpose-Built Vector Databases
Pinecone is a fully managed vector database service that requires no infrastructure management. Vectors are stored in namespaces within indices, with automatic scaling, replication, and backup. Pinecone's serverless offering charges based on storage and query volume rather than provisioned capacity, making it cost-effective for variable workloads. Key strengths include excellent hybrid search combining dense vectors with sparse keyword representations, strong metadata filtering, and enterprise features like encryption and access control. Pinecone is the default choice for teams that want to minimize operational overhead and are comfortable with a managed service.
Weaviate is an open-source vector database with a distinctive feature set including GraphQL-based querying, built-in vectorization modules that can embed text automatically during ingestion, and multi-modal support for text, images, and other data types. Weaviate can run self-hosted or as a managed service (Weaviate Cloud Services). Its vectorization modules mean you can send raw text to Weaviate and it handles embedding internally, simplifying the application code. The tradeoff is tighter coupling between the database and embedding model choice.
Qdrant is written in Rust and emphasizes performance and filtering capabilities. Its filtering system supports complex boolean queries on payload fields, enabling sophisticated retrieval constraints like "find documents similar to this query that were published after January 2025 and tagged as technical documentation." Qdrant supports both on-disk and in-memory storage modes, making it adaptable to different cost and latency requirements. Available as open source for self-hosting or as Qdrant Cloud.
Milvus targets large-scale deployments with billions of vectors. Built on a distributed architecture, Milvus separates compute, storage, and coordination into independent components that can scale horizontally. It supports GPU-accelerated indexing and search for maximum throughput. Milvus is the strongest option for teams with very large vector collections (100M+ vectors) that need high query throughput, but it comes with proportionally higher operational complexity.
Chroma focuses on simplicity and developer experience. It runs as an in-process library (no separate server needed) for development and small-scale production, with a client-server mode for larger deployments. Chroma's Python API is minimal and intuitive, making it the fastest option for prototyping RAG systems. It includes built-in embedding functions that handle vectorization automatically. The tradeoff is fewer enterprise features and less performance optimization for very large collections compared to Pinecone or Milvus.
Traditional Databases with Vector Extensions
PostgreSQL with pgvector adds vector similarity search to the world's most popular open-source relational database. For teams already running PostgreSQL, pgvector avoids introducing a new database service by keeping vectors alongside relational data in the same system. This simplifies operations, reduces infrastructure costs, and enables SQL queries that combine vector similarity with traditional relational filters in a single statement.
pgvector handles millions of vectors effectively with HNSW indexing and supports cosine, inner product, and L2 distance metrics. It does not match the performance of purpose-built vector databases at billion-vector scale, and it lacks some advanced features like automatic sharding and built-in hybrid search. But for teams with existing PostgreSQL infrastructure and moderate-scale RAG needs, it is a pragmatic choice that avoids operational complexity.
Index Types and Tradeoffs
Vector databases use approximate nearest-neighbor (ANN) algorithms to make search fast enough for real-time queries. The main index types each offer different tradeoffs between speed, accuracy, and memory usage.
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each vector connects to its nearest neighbors. Search navigates this graph from coarse to fine layers, finding approximate nearest neighbors very quickly. HNSW offers the best combination of speed and recall but requires storing the full graph in memory, which means memory usage scales linearly with collection size.
IVF (Inverted File Index) partitions vectors into clusters and searches only the closest clusters to the query. This reduces memory requirements compared to HNSW but may miss relevant vectors in adjacent clusters, producing lower recall. Increasing the number of clusters searched improves recall at the cost of latency.
Product Quantization (PQ) compresses vectors by dividing them into sub-vectors and replacing each with a codebook entry. This dramatically reduces storage requirements but introduces quantization error that reduces search accuracy. PQ is typically combined with IVF for large-scale deployments where memory constraints prevent storing full-resolution vectors.
Selection Criteria
Choose your vector database based on these practical considerations. For managed simplicity with strong production features, Pinecone or Weaviate Cloud. For self-hosted with maximum performance, Qdrant or Milvus. For development and prototyping, Chroma. For teams already running PostgreSQL who want to avoid a new service, pgvector. For billion-scale collections with high throughput requirements, Milvus.
Consider the total cost of ownership, not just the database pricing. Managed services charge for storage and queries but eliminate ops staffing. Self-hosted databases avoid per-query fees but require infrastructure, monitoring, and maintenance. For many teams, the ops cost of running a self-hosted vector database exceeds the API costs of a managed service, making managed options cheaper despite higher per-query pricing.
Hybrid Search Implementation
Hybrid search, combining vector similarity with keyword matching, has become a baseline requirement for production RAG in 2026. Different databases implement this differently. Pinecone supports sparse-dense hybrid search natively, allowing you to send both a dense vector and a sparse keyword representation in a single query. Weaviate uses BM25F for keyword search alongside vector search, with configurable fusion weights. Qdrant supports full-text search alongside vector search with payload-based filtering.
For databases that do not natively support hybrid search, you can implement it at the application level by running separate vector and keyword searches, then merging results using reciprocal rank fusion (RRF). RRF assigns each result a score based on its rank in each individual search and combines these scores to produce a final ranking. This approach works with any vector database and any keyword search system, giving you flexibility in your infrastructure choices.
Data Management and Operations
Production vector databases require ongoing operational attention. As documents are added, updated, and removed from the knowledge base, the vector index must be updated accordingly. Adding new vectors is straightforward in all databases. Updating existing vectors requires deleting the old version and inserting the new one, which means your application must track the mapping between source documents and their vector IDs. Deleting vectors when documents are removed prevents stale information from appearing in search results.
Backup and disaster recovery procedures should be established before going to production. Managed services handle this automatically, but self-hosted deployments need explicit backup schedules, tested restoration procedures, and monitoring for index corruption. The cost of re-embedding an entire document collection from scratch (if the index is lost) can be significant for large knowledge bases.
The right vector database depends on your scale, ops capacity, and existing infrastructure. Start with Chroma for prototyping, move to Pinecone or Qdrant for production, and consider pgvector if you already run PostgreSQL. The choice matters less than you think for most projects, as retrieval quality is determined primarily by your embedding model and chunking strategy rather than the database engine.