A jsonb column with 47 keys looks elegant in a migration. It stops looking elegant the day you need to add an index on the third key and discover that your query is doing a sequential scan over 200,000 rows because you indexed the wrong path. Postgres jsonb vs columns is not an aesthetic argument. It is a question of which fields you query, how you query them, and how often the shape changes. We make this decision a dozen times a year across five SaaS apps. Here is the rule we follow.
The clean rule
If you query it, filter on it, sort by it, or join on it, it is a typed column. Otherwise it is jsonb.
That is the decision rule. Everything else in this article is calibration.
The temptation is to make every "configuration" or "settings" field a jsonb blob because it feels future-proof. The reality is that the moment you need to filter customers by settings->>'plan_tier', you are doing it wrong. Make it a column.
The opposite temptation is to add a column for every field that might exist. That works until your table has 87 columns, 60 of which are null on most rows. At that point, jsonb for the long tail is the cleaner answer.
Both extremes are wrong. The middle is where good schemas live.
When jsonb is the right answer
Use jsonb when:
- The data is genuinely free-form (user-supplied custom fields, third-party API responses you store verbatim, audit log payloads).
- The shape varies per row (different feature flags by tenant, different metadata per integration).
- You only ever read it whole, never filter or aggregate on its inner fields.
- The data is rarely indexed and rarely queried in WHERE clauses.
Concrete real-world examples we use:
-- Carriva: the raw RIS document parse, kept verbatim for audit
ALTER TABLE ris_documents ADD COLUMN raw_parse jsonb;
-- Lesson planning: the per-lesson custom variables a teacher set
ALTER TABLE lessons ADD COLUMN teacher_variables jsonb;
-- Studio: third-party webhook payload kept for replay
ALTER TABLE webhook_events ADD COLUMN payload jsonb NOT NULL;
In all three, we never filter on inner keys. We retrieve the row by id and we read or render the whole blob.
When typed columns are the right answer
Use a typed column when:
- You filter on the field in a WHERE clause more than once.
- You sort or aggregate on the field.
- You need a foreign key.
- You need a CHECK constraint.
- You need a meaningful default at the database level.
- The query planner needs accurate statistics on the value distribution.
This is more cases than you think. Things that look "configuration" but should be columns:
plan_tier(free / pro / enterprise) → column with an enum or text + check.created_at,updated_at, anything date-typed.is_active,is_archived, any boolean filter.tenant_id,org_id,owner_id, every foreign key.score,count, any numeric you might aggregate.
A real case: we had settings.plan_tier as a jsonb field on a tenants table early in Carriva. Three months in, we had 8 places querying settings->>'plan_tier' and one of them did not have the right index. It became a column. That migration took 20 minutes. It would have been zero minutes if we had started right.
The indexing reality
This is where most jsonb decisions go wrong in practice.
B-tree indexes do not work on whole jsonb columns
You cannot put a B-tree index on a jsonb column and expect it to help filtering. B-tree expects an orderable scalar.
You can put a B-tree index on an expression that extracts a scalar from jsonb:
CREATE INDEX idx_lessons_subject ON lessons ((teacher_variables->>'subject'));
This works. It is a partial answer. It indexes one path. If you also want to filter on level, you need another index. Then on topic, another. After three or four such indexes you have rebuilt a column-based schema, badly.
GIN indexes are the jsonb-native answer
CREATE INDEX idx_lessons_vars ON lessons USING gin (teacher_variables);
A GIN index lets Postgres efficiently answer queries like WHERE teacher_variables @> '{"subject": "math"}'. It is the right tool for "I do not know which keys I will query, but I want them all to be searchable".
The cost: GIN indexes are bigger and slower to update than B-tree indexes. On a write-heavy table, GIN can become a bottleneck. We have one Carriva table where we removed a GIN index because the write throughput dropped 40% and we never used the search.
jsonb_path_ops is the variant nobody uses
For containment queries (the @> operator), jsonb_path_ops produces a smaller, faster GIN index than the default jsonb_ops:
CREATE INDEX idx_lessons_vars ON lessons USING gin (teacher_variables jsonb_path_ops);
If you only need @> (which is the most common case), this is the better default.
A real schema we run
Here is a sketch of how we structure a multi-tenant table on Carriva. Note the mix.
CREATE TABLE audit_findings (
id bigserial PRIMARY KEY,
tenant_id uuid NOT NULL REFERENCES tenants(id),
user_id uuid NOT NULL REFERENCES users(id),
audit_id uuid NOT NULL REFERENCES audits(id),
finding_type text NOT NULL, -- column, filtered constantly
severity int NOT NULL CHECK (severity BETWEEN 1 AND 5),
is_resolved bool NOT NULL DEFAULT false,
resolved_at timestamptz,
raw_evidence jsonb NOT NULL, -- jsonb, never filtered
metadata jsonb NOT NULL DEFAULT '{}'::jsonb, -- jsonb, occasional GIN search
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX idx_findings_tenant_created
ON audit_findings (tenant_id, created_at DESC);
CREATE INDEX idx_findings_unresolved
ON audit_findings (tenant_id, is_resolved)
WHERE is_resolved = false;
CREATE INDEX idx_findings_metadata
ON audit_findings USING gin (metadata jsonb_path_ops);
A few choices to highlight:
tenant_idis a typed column with an FK and a leading-column index. Multi-tenant discipline starts here. We covered the broader pattern in our multi-tenant from day one playbook.finding_type,severity,is_resolvedare columns because we filter on all three constantly.raw_evidenceis jsonb because it is the verbatim parse from the RAG pipeline. We never filter inside it; we render it for the user.metadatais jsonb with a GIN index because it holds free-form tags that ops occasionally search.- A partial index on unresolved findings because that is the hot query. Partial indexes are the most underused performance tool in Postgres.
The mistakes we have made
Public confession to spare you the same lessons.
Storing user IDs inside jsonb
Once. We had participants as a jsonb array of user IDs. Worked fine for a year. Then we needed "show all sessions where user X participated". With user IDs inside jsonb, that query is a sequential scan even with GIN. We refactored to a join table. The migration was a long Sunday.
Treating jsonb as a permanent draft
We had a customer_settings jsonb column on Carriva that grew over time as we added features. Every new feature added a key. Eventually we had 14 fields, and 6 of them were filtered or aggregated regularly. We migrated those 6 to columns. The remaining 8 stayed as jsonb. The mixed result is the right end state, but it took us six months to recognize that.
GIN on a write-heavy table
Mentioned above. A GIN index on a 100-write-per-second table can dominate disk I/O. Always check the write impact before adding GIN to a hot table.
Migration patterns
Moving a field from jsonb to a column is doable in production. A safe playbook:
- Add the new typed column as nullable.
- Backfill with
UPDATE ... SET col = (jsonb_field->>'key')::typein batches. - Update the application to read from the new column, fall back to jsonb if null.
- Update the application to write to both new column and jsonb.
- Verify in production for a week.
- Stop writing to jsonb for that field.
- Eventually drop the jsonb key (or leave it, it does not hurt much).
We did this exact playbook three times on Carriva. It works without downtime if you batch the backfill and you never run the migration on Friday.
The right Postgres schema is the one your queries reveal. Start with the obvious columns. Use jsonb for the genuinely amorphous. Refactor when you have evidence, not when you have anxiety.
TypeScript and the jsonb type
A practical concern: how do you type jsonb in TypeScript? We use a tagged interface plus a Zod schema for validation at the boundary.
import { z } from "zod";
export const TeacherVariablesSchema = z.object({
subject: z.string().optional(),
level: z.string().optional(),
topic: z.string().optional(),
customFields: z.record(z.string()).optional(),
});
export type TeacherVariables = z.infer<typeof TeacherVariablesSchema>;
The Zod schema validates at the API boundary. The TypeScript type flows through the codebase. The database is the dumb storage. This pattern requires discipline at the application layer; we covered the broader TypeScript hygiene topic in our TypeScript strict mode flags writeup.
The alternative is jsonb typed as unknown everywhere, which forces casting on every read. We tried that. It rotted.
Performance benchmarks we ran
A small benchmark on our own workload, audit_findings table at 1.2 million rows, Postgres 16, m4.large equivalent. The instance is one of our self-hosted Postgres clusters with the standard tuning we apply across the studio.
| Query pattern | Index strategy | P95 latency |
|---|---|---|
WHERE tenant_id = ? AND finding_type = ? | composite B-tree | 4 ms |
WHERE metadata @> '{"flag":"x"}' | GIN jsonb_path_ops | 18 ms |
WHERE metadata @> '{"flag":"x"}' | GIN jsonb_ops | 25 ms |
WHERE metadata @> '{"flag":"x"}' | no index, seq scan | 1,800 ms |
WHERE settings->>'plan' = 'pro' | B-tree on expression | 6 ms |
WHERE settings->>'plan' = 'pro' | no index | 1,200 ms |
Two takeaways. First, the right index is the difference between 4 ms and 1,800 ms. Second, expression indexes on a single jsonb path are competitive with typed columns on small workloads. They lose at scale because the planner has worse statistics on expressions than on columns. If you can pull the field out, do it. If you cannot, an expression index is a fine compromise.
What we would test first
For a fresh schema:
- List the queries you know you will run in the first 6 months.
- Every field that appears in WHERE, ORDER BY, or JOIN gets a typed column.
- Free-form metadata, third-party blobs, and audit payloads start as jsonb.
- Add indexes only after you have written the queries, not before.
- Plan to migrate at least one jsonb field to a column within a year. It will happen. Do not pretend it will not.
If you take only one thing from this: defaults matter. The default is to over-jsonb because it feels flexible. The honest default is to use typed columns and reach for jsonb only when the data is truly amorphous.
TL;DR
Typed columns for anything you query, filter, sort, join, or constrain. jsonb for genuinely free-form blobs. GIN with jsonb_path_ops if you must search inside. Expression indexes for one-key cases. Postgres jsonb vs columns is not religion; it is matching the storage to the access pattern. Every schema we run mixes both, and that mix is the right answer.



