Skip to main content
BlogAI Engineering

Vector Databases Compared in 2026: pgvector, Qdrant, Pinecone, Weaviate

An honest comparison across cost, query performance, operational overhead, and the boring details that decide which one survives in production.

Vector Databases Compared in 2026: pgvector, Qdrant, Pinecone, Weaviate

The biggest mistake in a vector database comparison 2026 is testing all four candidates on 10,000 embeddings, picking the fastest one, and discovering at 2 million embeddings that your "fastest" choice has a 14 GB RAM footprint and a query latency of 800 ms. Most benchmarks lie because they measure scenarios that do not match production. We run RAG in production for Carriva (legal-grade retrieval over French pension law) and we have shipped at smaller scale on three other apps. Here is the comparison that matters: pgvector vs Qdrant vs Pinecone vs Weaviate, on real production criteria.

The four contenders, briefly

pgvector is a PostgreSQL extension that adds vector similarity search to plain Postgres. You query it with SQL.

Qdrant is a standalone vector database written in Rust. Self-hostable, has a managed cloud, REST and gRPC API.

Pinecone is a managed-only vector database. Easy to get started, no self-host option.

Weaviate is a standalone vector database with a heavier ML-platform feel. Self-hostable, also has a managed offering.

These are not all the candidates. Milvus, Chroma, Vespa, Lance, Turbopuffer all exist. The four above are the ones we have seriously evaluated and the ones most teams are choosing between.

The dimensions that actually matter

For a small team running production:

  1. Cost at your scale, not at toy scale. Below 100k vectors, all four are basically free. Between 100k and 10M, the curves diverge.
  2. Operational overhead. Self-hosted means you patch it, back it up, and wake up if it dies.
  3. Query latency at P95, not P50. Production workloads see the tail.
  4. Filterable metadata. Most real RAG queries combine vector similarity with WHERE-clause-style filters.
  5. Ecosystem fit. If your stack already has Postgres, pgvector is one fewer system.
  6. Hybrid search. Vector + keyword (BM25) often beats pure vector. Check support.

Public benchmarks emphasize raw query speed. Real workloads are mostly bottlenecked on filter combinations and metadata indexing.

The comparison table

Approximate, late 2026 figures for a midsize RAG workload (around 1M embeddings, 1536 dimensions, 10 QPS sustained, mixed metadata filtering).

DimensionpgvectorQdrantPineconeWeaviate
Self-hostYes (just Postgres)YesNoYes
Managed optionMany providersQdrant CloudPinecone (only)Weaviate Cloud
Cost at 1M vectorsPostgres VPS, ~30 EUR/mo~50-100 EUR/mo self-host~70-120 USD/mo managed~80 EUR/mo self-host
P95 query latency30-80 ms10-25 ms15-40 ms20-50 ms
Metadata filteringFull SQLStrongAdequateStrong
Hybrid searchFull-text + pgvectorBuilt-in BM25Sparse-dense hybridBuilt-in BM25
Operational overheadLow if you already run PostgresMediumNone (managed only)Medium
Backup storyPostgres nativeSnapshot-basedProvider-managedSnapshot-based
Best forMost teams under 5M vectorsFast, scalable, self-host"Just works" managedML-platform teams

A few important notes on this table.

Latency depends on index choice

pgvector with HNSW (added in pgvector 0.5+) is dramatically faster than the older IVFFlat index. The numbers above assume HNSW. If you ran pgvector benchmarks two years ago and concluded "too slow", that conclusion is stale.

Qdrant uses HNSW by default. It is fast.

Pinecone's index type is mostly opaque to the user. It is fast in practice.

Weaviate also uses HNSW by default.

The HNSW index is the standard for ANN (approximate nearest neighbor) search in 2026. All four contenders support it. The difference is implementation quality and how it interacts with metadata filters.

Filtering is the hidden cost

A query like "find the top 10 nearest vectors WHERE tenant_id = X AND created_at > Y" is the realistic shape. Pure vector search is the easy case.

pgvector handles this beautifully because it is just SQL. The query planner combines the metadata filter with the vector index. Qdrant and Weaviate both have first-class filter syntax. Pinecone supports metadata filtering but with restrictions on cardinality and combinations.

For a multi-tenant workload, pgvector or Qdrant are the safest bets. Pinecone's filtering has historically been a source of "why is my query slow" investigations.

When pgvector is the right answer

You should use pgvector if:

  • You already run Postgres (we do, on five SaaS apps).
  • You have under 5M vectors.
  • Your queries combine vector similarity with SQL-style filters.
  • You value operational simplicity (one database to back up, one connection pool, one set of metrics).
  • You want full-text search and vector search in the same query.

We use pgvector on Carriva. The corpus is French pension regulations and we have low single-digit millions of chunks indexed. Queries combine tenant_id filtering with vector similarity. The latency is acceptable (under 80 ms P95), the operational story is identical to our regular Postgres, and we get to write SQL.

The honest tradeoff: pgvector is not the fastest at extreme scale. If you cross 10M vectors and you are running a production system that needs sub-20 ms P95, you will outgrow it. We have not.

We covered the broader self-hosted Postgres story (including pgvector setup) in our self-hosting Postgres writeup.

When Qdrant is the right answer

Qdrant is our second-favorite option. Reasons to pick it over pgvector:

  • You expect to grow past 10M vectors and you want headroom.
  • You have a non-Postgres stack and adding Postgres just for vectors does not make sense.
  • You want a slightly faster P95 and your budget for managing a separate database is real.
  • You want sparse-dense hybrid search out of the box.

Qdrant's self-hosted operational story is decent. Single binary, snapshot-based backups, decent monitoring. We ran a Qdrant cluster for an experiment and the experience was clean.

The reason we did not switch from pgvector: we already run Postgres, the latency was fine, and adding a second stateful database to back up was not worth the marginal speed.

When Pinecone is the right answer

Pinecone is the right answer when:

  • You want a managed service, full stop.
  • You have no engineering team to operate a database.
  • Your scale is large and the cost of self-hosting plus on-call would exceed the managed bill.

Pinecone has invested heavily in the developer experience. The API is clean. The dashboards are nice. You will not wake up at 2am to fix a vector database.

The downsides:

  • Cost climbs faster than self-hosted options as you scale.
  • Lock-in. Migrating off Pinecone is a real project.
  • Filter limitations on high-cardinality metadata.

If you are a small team without ops capacity and the bill is acceptable, Pinecone is a fine answer. If you are scaling toward 50M vectors, run the cost math first.

When Weaviate is the right answer

Weaviate is positioned more as an ML platform than a database. It has built-in module support for embedding models, generative models, and hybrid search. If you want a system that does more than store and query vectors, Weaviate has more to offer than Qdrant.

We did not pick Weaviate because we already have an embedding pipeline (we call OpenRouter for embeddings) and we do not need the platform layer. For teams that want batteries included, it is a reasonable choice.

The operational overhead is similar to Qdrant. The query performance is similar to Qdrant. The differentiator is the wider feature surface.

The cost analysis nobody publishes

Approximate monthly cost for 1 million 1536-dimensional vectors with light QPS (10 sustained).

OptionMonthly costNotes
pgvector on existing Postgres VPS~5 EUR marginalRAM bump only
pgvector on dedicated VPS~30 EUR4 GB RAM Hetzner
Qdrant self-hosted~50 EUR8 GB RAM VPS
Qdrant Cloud~80 USD1 GB RAM tier
Pinecone managed~70-120 USDPod-based pricing
Weaviate self-hosted~50-80 EURSimilar to Qdrant
Weaviate Cloud~95 USDSandbox tier scales up

At 10M vectors, the numbers shift. Self-hosted options need more RAM (HNSW is RAM-hungry); managed options scale by pod count. The crossover where managed becomes attractive is roughly when your engineering hours saved per month exceed the cost difference.

The hybrid search reality

A query that finds the right document by combining keyword match (BM25) with vector similarity often outperforms either alone. This is what Pinecone calls "sparse-dense hybrid", what Qdrant and Weaviate call "hybrid search", and what pgvector calls "use Postgres full-text search alongside the vector index in the same query".

-- pgvector with full-text search
SELECT id, content, 
       (1 - (embedding <=> $1)) AS vector_similarity,
       ts_rank(tsv, query) AS keyword_rank
FROM documents,
     plainto_tsquery('french', $2) query
WHERE tenant_id = $3
  AND tsv @@ query
ORDER BY (vector_similarity * 0.7 + keyword_rank * 0.3) DESC
LIMIT 10;

This kind of hybrid query is one of pgvector's strongest cards. You get full SQL composition and you do not need a separate keyword index.

The right vector database is the one that disappears into your existing operational story. Cleverness elsewhere comes second.

How this fits with the bigger RAG picture

Vector database choice is only one part of a RAG pipeline. Embedding model, chunking strategy, retrieval scoring, reranking, and prompt design all matter at least as much. We covered the broader RAG architecture in our RAG vs fine-tuning decision framework, and the regulated-industry version (where Carriva lives) in our RAG in regulated industries deep dive.

A common pattern: teams obsess over the vector database and underinvest in chunking. Bad chunks make any vector database produce mediocre results. Good chunks let a "boring" vector database (like pgvector) deliver excellent results.

Migration paths, in case you change your mind

We have migrated from one vector database to another twice in our history. It is real work but it is doable.

The pattern:

  1. The embeddings themselves are portable (just a vector of floats).
  2. The metadata schema is portable.
  3. The query layer is the breaking change. Most apps tightly couple their query code to the database client.

Mitigating the lock-in: write a thin "vector store" abstraction in your application layer. Your business logic calls vectorStore.search(...). The implementation can be pgvector today and Qdrant tomorrow. The whole abstraction is maybe 200 lines of TypeScript.

We have this layer. It saved us a week of work the time we tested switching backends.

What we would test first

If you are starting a RAG pipeline in 2026:

  1. If you already run Postgres, start with pgvector. It is one new extension, not one new system.
  2. Measure your actual P95 query latency at your real corpus size, not at toy scale.
  3. Add filterable metadata you actually filter on, not "all the metadata in case we need it".
  4. Only graduate to a dedicated vector database when you have evidence that pgvector is the bottleneck.
  5. Write a thin abstraction over the vector store so the migration cost is bounded.

We have shipped two RAG products in production and we are still on pgvector. The marketing for dedicated vector databases is excellent. The operational reality, for our scale, has not justified the switch.

TL;DR

For most teams, pgvector is the right default in 2026. Qdrant is the right answer for non-Postgres stacks or for projected scale beyond 10M vectors. Pinecone for managed, no-ops simplicity at the cost of a higher bill. Weaviate for ML-platform shops who want batteries included. The vector database comparison 2026 winner is "the one that fits your existing stack". Cleverness costs more than it saves.

A small thing

Want to work with us?

We are a small studio shipping focused B2B SaaS for niche professional verticals. If your problem looks like one of ours, we would love to chat.