Skip to main content
BlogEngineering

Multi-Tenant From Day 1: A Small Studio's Playbook

The specific patterns we use to make every product multi-tenant before we have a second tenant, without over-engineering ourselves into paralysis.

Multi-Tenant From Day 1: A Small Studio's Playbook

Every time we start a new product, we have the same argument with ourselves. "We have zero customers, why are we putting a tenant_id on every table on day one?" Every time, six months in, when the first customer asks "can I onboard my three associates with separate dashboards?", we are glad we did. This is the exact playbook we use to build multi-tenant SaaS architecture without losing two months on plumbing we do not need yet.

We will be specific about Postgres, Next.js, and the cases where we deliberately did not bother.

The three multi-tenancy models, and the one we use

There are roughly three classic patterns:

  1. Database per tenant. One Postgres database per customer. Pristine isolation. Operationally a nightmare for a small team. You do not want to migrate 200 databases on a Tuesday.
  2. Schema per tenant. One Postgres schema per customer in a shared database. Slightly less terrible to operate but still a migration headache and most ORMs hate it.
  3. Shared schema with tenant_id column on every table. All customers in the same tables, scoped by a tenant identifier on every row.

We use option 3 across Carriva, PrepareMesCours, DraftMyLesson, PrzygotujLekcje, and Creaclases. It is the only one a one-person studio can run without going feral. The same pragmatism that drove our early product decisions on why we built Carriva drove this choice: simple operational shapes that survive a small team.

The cost of multi-tenant from day one is one column on every table. The cost of retrofitting it later is two months of your life.

There is a real tradeoff: shared-schema is the most exposed to "I forgot a WHERE tenant_id = $1 in one query and now Tenant A sees Tenant B's data." That is a real risk. We mitigate it with three layers of belt-and-suspenders, described below.

The Postgres patterns we standardize

Every project inherits the same skeleton. We do not negotiate with ourselves on this.

Every table has tenant_id

Even tables that "obviously" belong to a tenant. Even tables you think will only ever be queried via a foreign key. The minute you have a JOIN across two tables and you forget the tenancy on one, you leak data. The discipline is "if it is a row that belongs to a customer, it has tenant_id directly, not transitively."

The one exception is platform-level tables (audit logs, plan definitions, signup waitlist) that are explicitly not tenant-scoped. We mark those with a comment in the schema and a code review rule.

A composite index on (tenant_id, created_at) or (tenant_id, id)

Every query that touches a tenant table starts with a WHERE tenant_id = $1. We make sure that prefix is the leading column of an index that the query planner can use. This is the unsexy half of performance. Get it right early and you will not lie awake explaining a 3-second query to a customer. We covered the broader Postgres operational story (including indexing on a self-hosted Postgres cluster) in a separate writeup; the multi-tenant indexing patterns build on those defaults.

Row-Level Security (optional, contextual)

Postgres has Row-Level Security policies. They let you enforce at the database level that a session can only see rows where tenant_id = current_setting('app.tenant_id'). We use RLS on Carriva because the data is sensitive (audit data on a real person's pension record). We do not use it on PrepareMesCours where the data is teacher-generated lesson content and the consequences of a leak are different.

The reason RLS is not always on is that it adds operational complexity. Long-lived sessions, connection pooling, and migrations all interact with RLS in ways that bite if you are not paying attention. We picked it for the highest-stakes product and skipped it elsewhere.

The application-layer patterns that prevent the leaks

Database-level discipline is one layer. The application layer matters more in practice because that is where we write the bugs.

A db.tenant(tenantId).query(...) wrapper

Every product has a tiny wrapper that takes a tenantId and returns a database client whose every query is auto-scoped. We never call the raw Postgres client directly from a route handler. The wrapper appends the tenant filter and refuses queries against tenant-scoped tables that do not include the filter.

This sounds heavy. It is roughly 80 lines of TypeScript. It has prevented a leak twice that we know of, both during late-night feature work where we would have forgotten the filter manually.

Middleware that resolves the tenant once per request

In Next.js, the App Router middleware (or a higher-level helper called from the route handler) resolves the tenant from the user's session, the subdomain, or the org slug, and attaches it to the request context. Every downstream call uses that context. The route handler does not parse the URL again. The DB layer does not re-resolve the user.

We use the same pattern across all four products. When we unified PrepareMesCours, DraftMyLesson, and PrzygotujLekcje into a monorepo we kept this contract identical. That is the difference between a 3-day refactor and a 3-week one.

Auth roles separate from tenancy

Tenancy is "which customer's data are you looking at?" Auth role is "what are you allowed to do with it?" We learned the hard way to keep these separate. Early on we had a role field on the session that mixed "tenant admin" with "platform admin". We standardized on is_platform_admin for platform-level checks because the role enum was getting reused inconsistently across pages and we kept finding sneaky checks.

If you take one thing from this section: never write if (session.role === 'admin') in a page that mixes platform and tenant admin paths. It always rots.

The deliberate "we are not doing that yet" list

A small studio is allowed to defer things. Here is what we deferred and when we will revisit:

  • Per-tenant database backups. We back up the whole Postgres cluster every night. The day a customer says "give me a SQL dump of just my data", we have a script ready that we tested in dev. We did not pre-build a self-service version.
  • Tenant-level rate limits. We have global rate limits and per-IP ones. Per-tenant rate limiting is a Q3 problem. If one tenant goes berserk, we will know.
  • Custom domains per tenant. Carriva uses subdomains. We considered custom domains. We deferred. The complexity of managing TLS certs at scale is not justified by current demand.
  • Multi-region. Everything is in Paris. The day a customer in Quebec needs us, we will have a real problem to solve. Not before.

The studio principle is "build for the next 6 months, not the next 6 years". If you spend day-one engineering on year-three problems, your product will not exist long enough to need them.

Migrations: the one place we are conservative

We learned painfully that on a multi-tenant system, migrations are the most dangerous moment. Our rules:

  1. Migrations are additive when possible. Add a column. Don't drop one.
  2. Backfills are a separate step from the schema change, run after the schema is deployed, ideally in batches.
  3. We never run a migration on Friday. We are not heroes.
  4. For the lesson-planning monorepo, we recently moved to applying schema changes directly without migration files. That works because the change pace dropped. We do not recommend it for products in heavy growth.

Migration discipline is the boring half of multi-tenant SaaS architecture and the half that pays the rent at 3am when something goes wrong.

When the model breaks (and we move part of it out)

There are a few patterns where shared-schema starts to crack:

  • A tenant has 100x the data of an average tenant. Their queries dominate the planner statistics. The honest fix is to split that tenant out to its own database, not to micro-optimize. We have not had to do it yet but Carriva might force it inside 18 months if a large CGP cabinet onboards.
  • A tenant has stricter compliance requirements. Some enterprise buyers require database-level isolation. We will accept that as an upsell tier, not as a default.
  • Cross-tenant analytics queries are a hot path. Then the answer is a separate analytics database, not changing the multi-tenant model. We feed analytics events into Umami self-hosted (and PostHog Cloud for Carriva) precisely so the production database does not get polluted with reporting workloads.

The pattern across all these failure modes is the same: do not change the architecture. Add a layer on top.

The audit log that pays for itself

One pattern that we did not appreciate until late: a per-tenant audit log of significant actions. Who created an account, who changed the plan, who exported data, who deleted a record. We turned this on six months in on Carriva and it has paid off twice already. The first time, a customer asked "did anyone delete my data?" We could answer with timestamps. The second time, an internal debugging session was instantly faster because the log told us "this user did X at 14:32 and the database stopped responding at 14:33". Correlation, not causation, but a 2-minute investigation instead of a 90-minute one.

The implementation is small: a single audit_events table with tenant_id, actor_id, action, payload_json, created_at. Every interesting action writes a row. We do not delete rows. The table grew to 2.1 million rows over 12 months on Carriva. Postgres handles it fine with a (tenant_id, created_at) index.

If you are starting fresh on multi-tenant SaaS architecture, add the audit table on day one. The cost is one more table and a tiny helper. The benefit is the day someone asks "what happened?" and you can answer.

A small studio's mental model

The framing that helps us most: tenancy is not a feature. It is a contract.

Every part of the system makes a promise about what it will and will not do. Tenant A's queries do not see Tenant B's data. Tenant A's API requests use Tenant A's quota, not the global quota. Tenant A's exported PDFs do not contain Tenant B's branding. Tenant A's billing does not get charged for Tenant B's actions.

Each promise has an enforcement layer. The database enforces some. The application enforces some. The API gateway enforces some. The contract is met when all layers agree.

When we onboard a new component (a new background job, a new third-party integration, a new admin endpoint), the first question is "does this component honor the tenancy contract?" If we cannot answer with confidence, we slow down. The cost of slowing down on a single component is hours. The cost of breaking the contract is reputation.

Multi-tenant is not a feature on the roadmap. It is a contract every component honors or breaks.

What is next for us

We are starting to formalize a small internal library so the four (soon five) products share the same tenancy primitives by default. Right now they share by convention, which works because one person writes them. The day a contractor lands on the team, the convention becomes a contract. The library will encode the contract so it is enforceable, not just documented.

Multi-tenant SaaS architecture is not the hardest part of running a small studio. The hardest part is being disciplined enough to do it before you need it. The one-line takeaway: put tenant_id on every row, scope every query, and keep the rest of your day-one design as boring as possible. The interesting decisions come later, after customers have given you the right problems to solve.

If you are mid-decision on this right now, our advice is to do it on day one even if you are skeptical. The day a customer onboards their three associates, you will write us a thank-you note.

A small thing

Want to work with us?

We are a small studio shipping focused B2B SaaS for niche professional verticals. If your problem looks like one of ours, we would love to chat.