Vision & Category·September 16, 2025·8 min read

Multi-Tenancy Is Not a Feature You Bolt On

Tenancy is the floor every query stands on, not a late addition. How Matrix isolates tenants from line one — JWT, TenantContext, orgId filtering, and BYOK encryption.

By Matrix Team

Most agent projects start single-tenant. One org, one set of API keys, one Postgres schema, a users table that quietly assumes everybody in it belongs together. It ships. It demos well. Then the second customer signs up, and you discover that "tenancy" is not a column you add — it's an assumption you baked into every query, every cache key, every background job, and every secret you stored.

A multi-tenant ai platform has to get this right first, because retrofitting it later means auditing every data path you ever wrote and proving — under penalty of a cross-tenant leak — that none of them can see another org's rows.

Matrix is multi-tenant from line one. Here's what that actually means in the code.

Tenancy is the floor, not a wall

The common mental model is wrong. People picture tenancy as a wall between customers — something you stand up at the edge, a middleware check on the way in. Walls have gaps. The moment one code path skips the check (a cron job, an admin tool, a "just this once" raw query), the wall has a hole and you don't find out until a customer does.

The right model is a floor: every query stands on the tenant. There is no "below" the floor. A read that doesn't know which org it's for shouldn't compile, let alone run.

In the 10-layer architecture, tenancy is layer 1 — underneath the domain graph, the agent runtime, voice, memory, everything. It's not the last thing the request touches. It's the first.

How a request acquires a tenant

Every request authenticates with a JWT (HS256, via Spring Security's OAuth2 resource server). The token carries the principal; the auth filter turns it into an immutable, request-scoped TenantContext:

public record TenantContext(
        String orgId,
        Long userId,
        String userEmail,
        Set<String> roles,
        String userType,        // OPERATOR (dashboard) vs CONTACT (channel-bound end-user)
        Long actingAgentId      // set when an AGENT runs a tool, for access control
) { ... }

TenantContextFilter populates this on the request thread and TenantContextHolder holds it in a ThreadLocal. That's the floor. From that point on, anything that reads or writes the graph asks the holder which org it's serving — it never trusts an orgId passed in from the caller's payload.

Two details matter:

userType discriminates the principal. An OPERATOR is a dashboard login (has email + password). A CONTACT is a channel-bound end-user — someone who reached an agent by phone and got resolved to a User row. Both live in the same tenant; the floor doesn't care how you logged in, only which org you belong to.
actingAgentId is set when an agent is the one running a tool. The human caller stays on userId as the on-behalf-of principal. This is the seam that lets access control compose an agent's grant with its caller's — covered in row, field, and type security for agents.

Every read and write filters by orgId

Matrix doesn't have a users table or a conversations table. Everything — Organization, Agent, Skill, Knowledge, Tool, Session, Message, Memory, Campaign — is an EntityNode row in Neo4j, described by an EntityType. (The why of that is its own post: everything is a node.) The payoff for tenancy is that there's one place to enforce isolation instead of one per table.

EntityManager is that place. Every list, get, and query reads TenantContextHolder.current() and folds the org into the Cypher:

public List<EntityDto> listEntities(String entityType) {
    var ctx = TenantContextHolder.require();
    return (entityType == null
                ? entityRepo.findByOrgId(ctx.orgId())
                : entityRepo.findByOrgIdAndEntityType(ctx.orgId(), entityType))
            .stream().map(...).toList();
}

findByOrgId. findByOrgIdAndEntityType. findByIdAndOrgId. findByIdInAndOrgId. There is no findById that skips the org. You can't accidentally fetch a node by id and forget whose it is, because the repository method that does that doesn't exist on the read path — you'd have to drop to the raw driver on purpose. require() (not current()) throws if there's no tenant on the thread at all: a read with no floor under it fails loudly instead of returning everyone's data.

EntityNode.orgId is the tenant marker stamped on every instance. The constraint is structural: a node without an orgId is an orphan no query can reach.

Platform globals vs. per-org custom types

Not everything is per-tenant, and pretending it is gets awkward fast. The built-in entity types — the schema for what an Agent or a Tool is — are shared across every org. The instances are not.

EntityType.ownerOrgId draws that line:

ownerOrgId == null → a platform global. The schema for Agent, Skill, Memory and friends. Everyone sees the type; nobody owns it.
ownerOrgId == "<some org>" → a per-org custom type. An org that overlays its own fields onto Lead or Contact creates a same-named type it alone owns.

Type listing resolves globals ∪ org-owned so an org sees the platform schema plus its own extensions — and never another org's custom types. This is how the platform stays strictly generic while every tenant gets to shape its own domain in data, not a fork.

The gotcha: cross-tenant boot operations bypass the floor

Here's the part that bites, and it's worth dwelling on because it's the exact tension a from-line-one design has to resolve.

EntityManager.listEntities filters by the current tenant. That's the whole point — it's the safety property. But some operations are legitimately cross-tenant: seeding the seven built-in tools into every org on boot, for instance. BuiltinToolsSeeder needs to walk all orgs and create a web_search, bash, file_read, and so on, in each.

If it called listEntities, it would see exactly one org's rows — and worse, even TenantContextHolder.runAs(TenantContext.platform()) only sees the platform org, because the platform context is still a tenant. The floor doesn't have a back door, and that's correct.

The answer is not to add a back door. It's to use a different tool for a genuinely different job:

Cross-tenant boot operations use the raw Neo4j Driver directly — opening a fresh Session per upsert — precisely because EntityManager is tenant-scoped and won't enumerate other orgs.

That's a deliberate, narrow, reviewable escape hatch — not a hole in the wall. The same pattern shows up wherever a boot-time job has to cross tenants: open the driver, do the cross-org work explicitly, and never route it through the tenant-scoped manager. The discipline is that the only code allowed below the floor is code that obviously, visibly intends to be there.

There's a related trap with Spring Data Neo4j: it closes its Driver on the first transient error, so a loop over the SDN repo will fail-cascade after one network blip. Cross-tenant boot loops use the raw driver for that reason too — a fresh Session per upsert survives a hiccup that would otherwise poison the rest of the batch.

BYOK: tenant secrets, encrypted at rest

Tenancy isn't just about who can read a row. It's about whose secrets you're holding. Matrix is bring-your-own-key: every org configures its own LLM providers, and the API key never lives in plaintext.

LlmProviderRegistry builds a Spring AI ChatModel per (org, provider). The key is stored encrypted via SecretEncryptor and decrypted only at the moment a model is constructed for that tenant. An org's OpenAI key is apiKeyEncrypted on that org's LlmProvider row — never shared, never logged, never visible to another tenant's runtime.

This has an operational consequence that surprises people, so it's worth stating plainly:

Importing a Neo4j dump across deployments breaks encrypted fields. LlmProvider.apiKeyEncrypted is encrypted with the source environment's master key. Restore that dump into a different environment and the ciphertext is unusable there. Re-create providers via the API on the new environment — don't expect a graph copy to carry working keys.

That's not a bug; it's the encryption doing its job. A secret encrypted under one environment's key should be opaque to another. The takeaway for operators: data migrates, secrets get re-issued.

Why retrofitting hurts so much

Step back and you can see why tenancy resists being bolted on.

In a single-tenant model, isolation is the default — there's only one tenant, so every query is trivially "scoped." When you add the second tenant, that default inverts: now isolation is something you must prove on every single path, and the paths that assumed a single tenant are silently wrong. The cache that keyed on agentId now serves org B's agent to org A. The background job that "processes all pending tasks" now processes everyone's. The admin endpoint that lists "all users" was fine when all users were one customer.

You can't find these by reading the happy path. You find them in production, as a cross-tenant leak, which is the one bug a platform genuinely cannot ship.

Building tenant-first flips the burden. Isolation is the default — require() throws without a tenant, the repository has no org-blind read, orgId is on every node — and crossing tenants is the thing you have to do on purpose, with the raw driver, in code a reviewer will notice. The expensive case (a leak) is structurally hard; the rare case (a boot seeder) is explicit. That's the trade you want.

Takeaway

Tenancy is a floor, not a wall. Put it underneath everything — a request-scoped TenantContext from the JWT, an orgId on every node, an EntityManager that has no org-blind read, BYOK secrets encrypted per tenant — and isolation becomes the default that the system enforces for you. Make crossing tenants the exceptional, explicit, raw-driver path, and you'll never wonder whether some forgotten endpoint is leaking org B into org A. Bolt it on later and you'll be auditing every query you ever wrote, hoping you didn't miss one.

The features customers see — voice, memory, RAG, campaigns — all stand on this floor. They get to be simple because none of them has to re-solve isolation.

Want to see the floor in action? Spin up a workspace, create a second org, and watch every query stay in its lane — read the generic entity model for how one enforcement point covers every type, then create a workspace and ship your first agent.

#multi-tenant ai platform#tenancy#neo4j

Build your first agent on Matrix

Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.

Create a workspace Read more articles