The biggest mistake in a vector database comparison 2026 is testing all four candidates on 10,000 embeddings, picking the fastest one, and discovering at 2 million embeddings that your "fastest" choice has a 14 GB RAM footprint and a query latency of 800 ms. Most benchmarks lie because they measure scenarios that do not match production. We run RAG in production for Carriva (legal-grade retrieval over French pension law) and we have shipped at smaller scale on three other apps. Here is the comparison that matters: pgvector vs Qdrant vs Pinecone vs Weaviate, on real production criteria.
The four contenders, briefly
pgvector is a PostgreSQL extension that adds vector similarity search to plain Postgres. You query it with SQL.
Qdrant is a standalone vector database written in Rust. Self-hostable, has a managed cloud, REST and gRPC API.
Pinecone is a managed-only vector database. Easy to get started, no self-host option.
Weaviate is a standalone vector database with a heavier ML-platform feel. Self-hostable, also has a managed offering.
These are not all the candidates. Milvus, Chroma, Vespa, Lance, Turbopuffer all exist. The four above are the ones we have seriously evaluated and the ones most teams are choosing between.
The dimensions that actually matter
For a small team running production:
- Cost at your scale, not at toy scale. Below 100k vectors, all four are basically free. Between 100k and 10M, the curves diverge.
- Operational overhead. Self-hosted means you patch it, back it up, and wake up if it dies.
- Query latency at P95, not P50. Production workloads see the tail.
- Filterable metadata. Most real RAG queries combine vector similarity with WHERE-clause-style filters.
- Ecosystem fit. If your stack already has Postgres, pgvector is one fewer system.
- Hybrid search. Vector + keyword (BM25) often beats pure vector. Check support.
Public benchmarks emphasize raw query speed. Real workloads are mostly bottlenecked on filter combinations and metadata indexing.
The comparison table
Approximate, late 2026 figures for a midsize RAG workload (around 1M embeddings, 1536 dimensions, 10 QPS sustained, mixed metadata filtering).
| Dimension | pgvector | Qdrant | Pinecone | Weaviate |
|---|---|---|---|---|
| Self-host | Yes (just Postgres) | Yes | No | Yes |
| Managed option | Many providers | Qdrant Cloud | Pinecone (only) | Weaviate Cloud |
| Cost at 1M vectors | Postgres VPS, ~30 EUR/mo | ~50-100 EUR/mo self-host | ~70-120 USD/mo managed | ~80 EUR/mo self-host |
| P95 query latency | 30-80 ms | 10-25 ms | 15-40 ms | 20-50 ms |
| Metadata filtering | Full SQL | Strong | Adequate | Strong |
| Hybrid search | Full-text + pgvector | Built-in BM25 | Sparse-dense hybrid | Built-in BM25 |
| Operational overhead | Low if you already run Postgres | Medium | None (managed only) | Medium |
| Backup story | Postgres native | Snapshot-based | Provider-managed | Snapshot-based |
| Best for | Most teams under 5M vectors | Fast, scalable, self-host | "Just works" managed | ML-platform teams |
A few important notes on this table.
Latency depends on index choice
pgvector with HNSW (added in pgvector 0.5+) is dramatically faster than the older IVFFlat index. The numbers above assume HNSW. If you ran pgvector benchmarks two years ago and concluded "too slow", that conclusion is stale.
Qdrant uses HNSW by default. It is fast.
Pinecone's index type is mostly opaque to the user. It is fast in practice.
Weaviate also uses HNSW by default.
The HNSW index is the standard for ANN (approximate nearest neighbor) search in 2026. All four contenders support it. The difference is implementation quality and how it interacts with metadata filters.
Filtering is the hidden cost
A query like "find the top 10 nearest vectors WHERE tenant_id = X AND created_at > Y" is the realistic shape. Pure vector search is the easy case.
pgvector handles this beautifully because it is just SQL. The query planner combines the metadata filter with the vector index. Qdrant and Weaviate both have first-class filter syntax. Pinecone supports metadata filtering but with restrictions on cardinality and combinations.
For a multi-tenant workload, pgvector or Qdrant are the safest bets. Pinecone's filtering has historically been a source of "why is my query slow" investigations.
When pgvector is the right answer
You should use pgvector if:
- You already run Postgres (we do, on five SaaS apps).
- You have under 5M vectors.
- Your queries combine vector similarity with SQL-style filters.
- You value operational simplicity (one database to back up, one connection pool, one set of metrics).
- You want full-text search and vector search in the same query.
We use pgvector on Carriva. The corpus is French pension regulations and we have low single-digit millions of chunks indexed. Queries combine tenant_id filtering with vector similarity. The latency is acceptable (under 80 ms P95), the operational story is identical to our regular Postgres, and we get to write SQL.
The honest tradeoff: pgvector is not the fastest at extreme scale. If you cross 10M vectors and you are running a production system that needs sub-20 ms P95, you will outgrow it. We have not.
We covered the broader self-hosted Postgres story (including pgvector setup) in our self-hosting Postgres writeup.
When Qdrant is the right answer
Qdrant is our second-favorite option. Reasons to pick it over pgvector:
- You expect to grow past 10M vectors and you want headroom.
- You have a non-Postgres stack and adding Postgres just for vectors does not make sense.
- You want a slightly faster P95 and your budget for managing a separate database is real.
- You want sparse-dense hybrid search out of the box.
Qdrant's self-hosted operational story is decent. Single binary, snapshot-based backups, decent monitoring. We ran a Qdrant cluster for an experiment and the experience was clean.
The reason we did not switch from pgvector: we already run Postgres, the latency was fine, and adding a second stateful database to back up was not worth the marginal speed.
When Pinecone is the right answer
Pinecone is the right answer when:
- You want a managed service, full stop.
- You have no engineering team to operate a database.
- Your scale is large and the cost of self-hosting plus on-call would exceed the managed bill.
Pinecone has invested heavily in the developer experience. The API is clean. The dashboards are nice. You will not wake up at 2am to fix a vector database.
The downsides:
- Cost climbs faster than self-hosted options as you scale.
- Lock-in. Migrating off Pinecone is a real project.
- Filter limitations on high-cardinality metadata.
If you are a small team without ops capacity and the bill is acceptable, Pinecone is a fine answer. If you are scaling toward 50M vectors, run the cost math first.
When Weaviate is the right answer
Weaviate is positioned more as an ML platform than a database. It has built-in module support for embedding models, generative models, and hybrid search. If you want a system that does more than store and query vectors, Weaviate has more to offer than Qdrant.
We did not pick Weaviate because we already have an embedding pipeline (we call OpenRouter for embeddings) and we do not need the platform layer. For teams that want batteries included, it is a reasonable choice.
The operational overhead is similar to Qdrant. The query performance is similar to Qdrant. The differentiator is the wider feature surface.
The cost analysis nobody publishes
Approximate monthly cost for 1 million 1536-dimensional vectors with light QPS (10 sustained).
| Option | Monthly cost | Notes |
|---|---|---|
| pgvector on existing Postgres VPS | ~5 EUR marginal | RAM bump only |
| pgvector on dedicated VPS | ~30 EUR | 4 GB RAM Hetzner |
| Qdrant self-hosted | ~50 EUR | 8 GB RAM VPS |
| Qdrant Cloud | ~80 USD | 1 GB RAM tier |
| Pinecone managed | ~70-120 USD | Pod-based pricing |
| Weaviate self-hosted | ~50-80 EUR | Similar to Qdrant |
| Weaviate Cloud | ~95 USD | Sandbox tier scales up |
At 10M vectors, the numbers shift. Self-hosted options need more RAM (HNSW is RAM-hungry); managed options scale by pod count. The crossover where managed becomes attractive is roughly when your engineering hours saved per month exceed the cost difference.
The hybrid search reality
A query that finds the right document by combining keyword match (BM25) with vector similarity often outperforms either alone. This is what Pinecone calls "sparse-dense hybrid", what Qdrant and Weaviate call "hybrid search", and what pgvector calls "use Postgres full-text search alongside the vector index in the same query".
-- pgvector with full-text search
SELECT id, content,
(1 - (embedding <=> $1)) AS vector_similarity,
ts_rank(tsv, query) AS keyword_rank
FROM documents,
plainto_tsquery('french', $2) query
WHERE tenant_id = $3
AND tsv @@ query
ORDER BY (vector_similarity * 0.7 + keyword_rank * 0.3) DESC
LIMIT 10;
This kind of hybrid query is one of pgvector's strongest cards. You get full SQL composition and you do not need a separate keyword index.
The right vector database is the one that disappears into your existing operational story. Cleverness elsewhere comes second.
How this fits with the bigger RAG picture
Vector database choice is only one part of a RAG pipeline. Embedding model, chunking strategy, retrieval scoring, reranking, and prompt design all matter at least as much. We covered the broader RAG architecture in our RAG vs fine-tuning decision framework, and the regulated-industry version (where Carriva lives) in our RAG in regulated industries deep dive.
A common pattern: teams obsess over the vector database and underinvest in chunking. Bad chunks make any vector database produce mediocre results. Good chunks let a "boring" vector database (like pgvector) deliver excellent results.
Migration paths, in case you change your mind
We have migrated from one vector database to another twice in our history. It is real work but it is doable.
The pattern:
- The embeddings themselves are portable (just a vector of floats).
- The metadata schema is portable.
- The query layer is the breaking change. Most apps tightly couple their query code to the database client.
Mitigating the lock-in: write a thin "vector store" abstraction in your application layer. Your business logic calls vectorStore.search(...). The implementation can be pgvector today and Qdrant tomorrow. The whole abstraction is maybe 200 lines of TypeScript.
We have this layer. It saved us a week of work the time we tested switching backends.
What we would test first
If you are starting a RAG pipeline in 2026:
- If you already run Postgres, start with pgvector. It is one new extension, not one new system.
- Measure your actual P95 query latency at your real corpus size, not at toy scale.
- Add filterable metadata you actually filter on, not "all the metadata in case we need it".
- Only graduate to a dedicated vector database when you have evidence that pgvector is the bottleneck.
- Write a thin abstraction over the vector store so the migration cost is bounded.
We have shipped two RAG products in production and we are still on pgvector. The marketing for dedicated vector databases is excellent. The operational reality, for our scale, has not justified the switch.
TL;DR
For most teams, pgvector is the right default in 2026. Qdrant is the right answer for non-Postgres stacks or for projected scale beyond 10M vectors. Pinecone for managed, no-ops simplicity at the cost of a higher bill. Weaviate for ML-platform shops who want batteries included. The vector database comparison 2026 winner is "the one that fits your existing stack". Cleverness costs more than it saves.



