All articles
Knowledge & RAG·January 3, 2026·8 min read

Auto-Wired Retrieval: Your Agent Shouldn't Need RAG Plumbing

Most RAG stacks make you hand-wire a retriever into the agent loop. In Matrix, attaching a Knowledge corpus is the only step — the search tool appears on its own.

By Matrix Team

Open almost any tutorial on retrieval augmented generation and you'll find the same shape: ingest your docs, stand up a vector store, write a retriever, then — the part nobody photographs — splice that retriever into the agent loop. You decide when to call it. You format its results into the prompt. You re-do the wiring for every agent that needs the corpus. The retrieval engine is the easy 20%; the integration is the other 80%, and it's bespoke every single time.

That last mile is where RAG stacks rot. A new corpus means new glue. A new agent means re-gluing. And the model never quite knows what it's allowed to search, because "what corpora exist" lives in your orchestration code, not in the agent's view of the world.

Matrix deletes that last mile. Attaching a Knowledge corpus to an agent is the only step. The search tool appears automatically — and its description tells the model exactly which corpora it can query. No retriever wiring, no per-agent code, no orchestration layer deciding when to retrieve.

The integration tax nobody quotes you

Here's the wiring a typical agent + RAG setup carries, per agent:

  • Instantiate or import the retriever for this corpus.
  • Insert a retrieval step into the agent's reasoning loop (pre-retrieve every turn, or expose retrieval as a function the model can call).
  • Format hits into the context window — and pick a token budget.
  • Tell the model, in the prompt, that retrieval exists and what's in it.
  • Repeat all of the above the next time you attach the same corpus to a different agent.

None of that is hard in isolation. The problem is it's repeated, it drifts (agent A retrieves differently from agent B), and it couples every agent to the physical details of your vector store. That's the integration tax. You pay it again every time the org spins up a new persona.

How Matrix does it: one composition path

Matrix has exactly one place where an agent's capabilities get assembled for a turn: AgentToolSurface.composeForCaller. This is the canonical composition path in the agent runtime, and it's the same one for text chat, real-time voice, and autonomous background tasks. For every turn, it unions everything the agent can do into one coherent tool surface:

  • agent.tools — direct HTTP and client-display tool refs
  • agent.mcpServers — external MCP servers, called as a client
  • skill.tools — tools pulled in by every attached Skill
  • BuiltinToolRegistry lookups — the INTERNAL-transport built-ins (web_search, bash, fetch_url, file_*, grep)
  • the auto-attached search_knowledge tool — when agent.knowledge is non-empty
  • the memory built-ins (gated by skill-requested keys)
  • the ambient get_current_time

Read that list again and notice what's not in it: a per-agent retriever. Retrieval isn't a special integration you bolt onto the loop. It's just another entry in the same union as everything else — and it materializes from a single fact: this agent has Knowledge attached.

That's the whole idea. Tools, skills, MCP servers, built-ins, memory, the clock, and retrieval all flow through one composer. There's no separate RAG pipeline grafted onto the side of the agent. (For the bigger picture of how those primitives compose, see Composing Agents From Four Primitives.)

The trigger is data, not code

The entire condition is: is agent.knowledge non-empty? Attach a corpus to an agent — through the agent drawer's Knowledge picker, or by setting the ref via the API — and on the very next turn composeForCaller sees a non-empty agent.knowledge and adds the tool. Detach it and the tool disappears. There is no code path that wires the retriever in; the presence of the attachment is the wiring.

This is the platform's recurring discipline: behavior comes from data, not from a fork. Adding RAG to an agent is the same kind of operation as giving it a different system prompt — you change a field, not a deployment.

The tool the model actually sees

When agent.knowledge is non-empty, the auto-attached tool is:

search_knowledge(knowledge_key, query, top_k)

Two details make this work without any prompt engineering on your part.

First, the description enumerates the attached corpora. The tool's own description lists which Knowledge corpora this agent has — by knowledge_key — so the model knows the exact menu it's allowed to query and which key maps to which body of knowledge. You don't write a paragraph in the system prompt explaining "you have access to the onboarding docs and the product FAQ." The tool description carries that, generated from the attachments, kept in sync automatically. An agent with three corpora attached sees three keys in the description; attach a fourth and the next turn's description has four.

Second, results come back citable. search_knowledge returns ranked chunks, each carrying a sourceRef — so the model can attribute an answer to its source instead of laundering retrieved text into an unsourced claim. Retrieval that can't cite is just a fancier way to hallucinate; carrying the source through to the result is what lets an answer say where it came from.

Under the hood, the Knowledge corpus was chunked (~2000 chars, 200-char overlap), each chunk embedded (768-d) and stored on its underlying :Entity node, reusing the same Neo4j HNSW vector index that powers agent memory — no separate vector database, no new migration. The ingestion side of that story is in RAG You Set Up by Dragging a PDF Into a Browser. The point here is that the agent doesn't know or care about any of it. It sees one tool, with one honest description, returning citable hits.

A real lesson: retrieval quality is a scoping problem

It would be dishonest to pretend "auto-attach a search tool" is the end of the story. The harder part is making the search good, and the most expensive mistake we made there was about scope.

The naive implementation runs a single global top-k over every chunk in the index and returns the highest-scoring matches regardless of which corpus they belong to. It sounds reasonable. It fails in a specific, maddening way: when one corpus is large or its phrasing happens to score high against a query, its chunks crowd out the corpus the model actually asked for. The model calls search_knowledge with the right knowledge_key, and a global top-k hands it neighbors from a different corpus that scored marginally better. The retrieval looks like it's working — it returns chunks — but the relevant body of knowledge gets buried.

The fix is to make search corpus-scoped: when the model names a knowledge_key, retrieval is constrained to that corpus before ranking, using an exact-cosine pass rather than letting a single global nearest-neighbor query decide what the model meant. The query store filters to the requested corpus, then ranks within it. The right corpus can't get crowded out by a louder one, because the louder one was never in the candidate set.

The takeaway under the takeaway: the easy part of RAG is calling a retriever; the hard part is making sure it returns the right thing. Auto-wiring the tool removes the integration tax, but correct scoping is what makes the auto-wired tool worth having.

Why "auto-wired" beats "configurable"

A fair objection: doesn't auto-attaching a tool take control away from the builder? In practice it's the opposite. Because retrieval rides the same composition path as every other capability, it inherits every property that path already guarantees:

  • Channel parity for free. The same composed surface drives chat, voice, and autonomous tasks, and Matrix treats byte-for-byte prompt parity across channels as an invariant. So search_knowledge behaves identically whether the contact typed a question or asked it on a phone call. You didn't wire retrieval into the voice path separately — there is no separate voice path to wire.
  • One mental model. Attaching Knowledge is the same shape of action as attaching a Skill or an MCP server: change a ref, get a capability. Nothing about retrieval is a special case you have to remember.
  • Always in sync. The tool's corpus menu is derived from the attachments at compose time, so it can never drift from reality. You can't forget to update a prompt paragraph, because there's no prompt paragraph to forget.

Control didn't go away. It moved to where it belongs — the attachment — instead of being smeared across orchestration code per agent.

Takeaway

In most stacks, RAG is a retriever you hand-wire into the agent loop, re-glued for every agent and every corpus, with the model only knowing what your orchestration code chooses to tell it. In Matrix, retrieval is one entry in a single composition path: AgentToolSurface.composeForCaller auto-attaches search_knowledge(knowledge_key, query, top_k) whenever agent.knowledge is non-empty, with a description that enumerates the attached corpora and results that carry a sourceRef for citation. The same path unions direct tools, MCP servers, skill tools, built-ins, memory built-ins, and the ambient clock — so there's no bespoke RAG plumbing on the agent end at all. And because retrieval quality lives or dies on scope, search is corpus-scoped exact-cosine, so the corpus the model asked for can't be crowded out by a louder one.

Wire nothing, retrieve everything

Spin up a workspace, upload a corpus at /admin/knowledge, and attach it to an agent in the agent drawer's Knowledge picker. That's the entire integration. Ask the agent a question that only the corpus can answer and watch it call search_knowledge on its own — no retriever code, no prompt surgery. Then read RAG You Set Up by Dragging a PDF Into a Browser for the ingestion side, and Composing Agents From Four Primitives for how every capability — including this one — flows through the same composer.

#retrieval augmented generation#tool composition#agent runtime

Build your first agent on Matrix

Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.

Keep reading