Stash: A Self-Hosted Memory Layer for AI Agents (Postgres + MCP)
Editorial note: Star and version counts in this article are snapshots as of April 26, 2026 and will move as the project ships. All install steps and architectural claims are drawn from the public Stash repository; verify against the latest README before deploying anything to your own infrastructure. Stash itself has not yet published independent benchmark scores, and we say so where it matters.
Most AI-agent memory coverage in 2026 reads the same way. Mem0 is good if you want a managed API. Zep is good if you want a temporal knowledge graph. Letta is good for tiered memory. They all assume you are fine sending your agent's full conversation history to a third-party cloud. Many teams are not, and the open-source quadrant of this category is where the more interesting work has been happening.
This post walks through Stash, a young, Apache 2.0, self-hosted memory layer that runs as a single Go binary on Postgres + pgvector and speaks MCP. We cover what it actually does, where it sits in the landscape, how to wire it into Claude Desktop in under ten minutes, and where it falls short today. If you have already read three "Mem0 vs Zep" comparisons this quarter, this is the one that covers the part those posts skip.
TL;DR: Stash (github.com/alash3al/stash, Apache 2.0) is a self-hosted memory layer for AI agents that runs as a single Go binary backed by Postgres + pgvector and exposes itself as an MCP server. Its differentiator is an 8-stage consolidation pipeline that turns raw conversation observations into structured facts, relationships, and confidence-decayed beliefs. The result behaves closer to a temporal knowledge graph than to a flat key-value store, but on infrastructure you own.
What Is Stash, and Why Does It Matter?
Stash is an open-source persistent memory layer for AI agents. It stores episodes, facts, and working context in Postgres with the pgvector extension for embeddings, runs database migrations on startup, and ships an MCP server with a background consolidation worker, all in one Docker Compose stack and a single Go binary. (Source: Stash repository, README.) The project describes itself as a "cognitive layer between your AI agent and the world," and the framing is accurate: nothing in Stash speaks to the LLM directly. It sits to the side of the agent and exposes save, recall, and consolidate tools over MCP, the same protocol Claude Desktop, Cursor, Cline, and Continue already speak.
The author, alash3al, is the same developer behind redix, a Redis-protocol-compatible key-value store with pluggable storage engines. That background shows up in Stash's design choices. The whole project is one binary, one database, no extra services. There is no Neo4j to operate (as Zep's open-source Graphiti requires), no separate vector DB (as Weaviate or Qdrant deployments need alongside Postgres), and no SaaS account (as Mem0's hosted offering centers on). For a team that has decided memory should run on infrastructure they control, that minimalism is the entire pitch.
The timing matters too. The "self-hosted MCP-native memory server" category essentially did not exist in early 2025; by April 2026 there are at least four meaningfully active projects in it (Stash, Memlord, OpenBrain, kg-memory-mcp). Coverage of that category, however, is still thin compared to the dozens of "best AI memory frameworks 2026" listicles aimed at managed-service buyers. If your data residency or compliance situation rules out the managed quadrant, you are evaluating in a much smaller and less-documented space than the SERP suggests. (For a parallel example of building Claude into stateful workflows, see our coverage of Claude Projects in applied workflows.)
What Problem Does a Memory Layer Actually Solve?
LLMs have no state between turns. Every API call is, from the model's perspective, a cold start. The naive fix is to paste the whole conversation history into the next prompt, and that runs into a measurable wall. The 2025 LongMemEval benchmark, an ICLR paper introducing 500 manually written questions across information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention, reported that commercial chat assistants drop roughly 30 percentage points in accuracy on questions that require facts from earlier sessions, even when those sessions fit inside the model's nominal context window. (Source: LongMemEval, arXiv:2410.10813.)
That gap is what every memory layer in the category is trying to close. The structural fix is the same across products: stop pasting the full history every turn, extract facts once at write time, retrieve only what is relevant at read time. The cost shape changes from "pay for the entire history on every call" to "pay once to store, pay tiny amounts to query." The accuracy floor goes up because you are no longer asking the model to find a needle in a 200-thousand-token haystack. You are handing it the needle directly.
The question is not whether to use a memory layer for any agent that runs over multiple sessions. By April 2026 that argument is largely settled. The question is which one, and the existing comparisons mostly stop at the cloud-managed boundary.
How Does Stash's 8-Stage Consolidation Pipeline Work?
Stash does not just store every message verbatim. The project's central design choice is an 8-stage consolidation pipeline that runs in the background and progressively abstracts raw observations into structured knowledge. The README enumerates the stages: facts extraction, relationship mapping, causal-link identification, goal tracking, failure-pattern recognition, hypothesis verification, confidence decay, and incremental processing of only-new data since the last run.
Two of those stages are unusual enough to be worth pausing on. Confidence decay treats every stored fact as a time-decaying belief, not a permanent record. The user's job changed in March; the model should not still be returning the old job title in November as if nothing happened. Most flat memory stores cannot express this. They store the new fact alongside the old one and leave the model to figure it out. Stash ages out older beliefs as new contradicting evidence arrives. Hypothesis verification, similarly, treats the consolidator's own outputs as provisional until subsequent observations corroborate or contradict them, instead of writing every extracted fact straight to canonical storage. (Source: Stash repository, README.)
Architecturally, this lives entirely in Postgres. Embeddings go to pgvector for similarity search. Facts, relationships, goals, and the rest of the consolidator's output land in regular relational tables. You can connect psql at any time and read what the agent thinks it knows about you. There is no opaque store. That transparency is genuinely useful for debugging and is something most managed memory APIs intentionally hide behind their abstractions.
How Do You Install Stash and Wire It to Claude Desktop?
Installation is a one-command Docker Compose deploy. The stack starts Postgres with the pgvector extension, runs the database migrations, and launches the MCP server with the background consolidation worker. From there you add one entry to your Claude Desktop (or Cursor / Cline / Continue) MCP configuration pointing at the local Stash endpoint, and the agent gains save and recall tools that survive across sessions.
git clone https://github.com/alash3al/stash.git
cd stash
docker compose up -d
docker compose logs -f stash # watch migrations and the MCP server come up
The MCP wiring on the client side is one block in claude_desktop_config.json (the location varies by OS, and Claude Desktop's docs are the source of truth). The exact field names should be checked against the current Stash README, but the shape is the standard MCP server entry that any client supports:
{
"mcpServers": {
"stash": {
"command": "docker",
"args": ["exec", "-i", "stash", "stash", "mcp"]
}
}
}
After restarting Claude Desktop, the new tools appear in the MCP indicator. The cleanest first sanity check is a two-session test. In session A, say something like "remember that my preferred deployment target is Hetzner ARM and I dislike Kubernetes." Close the chat completely. Open a new chat and ask "what do you know about my deployment preferences?" If the recall works, your memory layer is wired correctly. If it does not, the most common culprits are an MCP handshake mismatch (different clients are slightly inconsistent), a Docker network problem reaching Postgres, or a missing pgvector extension on a non-default Postgres image.
Because the data lives in a real Postgres, the second sanity check is worth doing once: connect with psql and read the consolidated tables yourself. Watching your own remembered facts appear as rows is the moment the architecture clicks. (For a different example of getting Claude to do useful work in a real workflow, see our PPC audit automation tutorial.)
How Does Stash Compare to Mem0, Zep, and Other Self-Hosted Options?
The honest answer: Stash does not really compete with managed Mem0 or cloud-leaning Zep, because it is aimed at a different buyer. Inside its actual peer group, the self-hosted, MCP-native, open-source quadrant, it competes most directly with Memlord, OpenBrain, and kg-memory-mcp, and adjacently with fully local options like OMEGA. The differentiation inside that group is the explicit 8-stage pipeline (Memlord and kg-memory-mcp are simpler embed-and-retrieve loops) plus the Postgres-only stack (no Neo4j, no Redis, no extra services).
A useful note on benchmark numbers in this category: published LongMemEval scores are dominated by managed-service vendors marketing to managed-service buyers. Mem0 and Zep have the most cited scores; some newer entrants such as MemPalace and OMEGA have published competitive numbers as well. Stash itself has not yet published a LongMemEval score as of April 2026. Treat any architectural comparison below as exactly that, architectural, not as a recall-quality verdict.
| Project | Hosting | Storage | MCP-native | Pipeline depth |
|---|---|---|---|---|
| Stash | Self-hosted | Postgres + pgvector | Yes | 8-stage consolidation |
| Mem0 | Managed (cloud-first) | Vendor-managed | Via bridge | Flat fact extraction |
| Zep / Graphiti | Managed; OSS core | Neo4j (temporal graph) | Via bridge | Temporal knowledge graph |
| Memlord | Self-hosted | Postgres + pgvector | Yes | Hybrid BM25 + semantic |
| OpenBrain | Self-hosted | Postgres + pgvector | Yes | Embed + auto-metadata |
| OMEGA | Local-only | SQLite + ONNX | Yes | Local embeddings |
Our reading: If you must self-host and you already operate Postgres, Stash is the cleanest pick of the four self-hosted projects above. One binary, one database, the deepest pipeline. If you want zero external services at all, OMEGA's SQLite-only stack is hard to beat. If you want a managed API and the data residency question is not a blocker, Mem0 or Zep are still the path of least resistance.
Where Does Stash Fall Short Today?
Three honest limitations. First, maturity: as of April 2026 Stash is at v0.2.0 with a small but non-zero star count and a single primary maintainer. That is fine for an early-adopter pick and not yet "nobody got fired for choosing it." Plan accordingly. Second, the absence of public benchmark scores. The architecture suggests the consolidation pipeline should help on multi-session and temporal-reasoning questions, but suggesting is not measuring. Until LongMemEval or LoCoMo numbers exist for Stash specifically, claims of recall quality are inferred, not proven.
Third, operational surface. The whole appeal of a managed memory API is that someone else runs it. Choosing Stash means you now operate a Postgres instance with pgvector, manage backups, monitor disk, and handle upgrades. For a team already running Postgres in production, that is rounding error. For a team without database operations expertise, the managed quadrant is genuinely a better fit even at the cost of data leaving your infrastructure.
One more thing worth flagging: the background-consolidation design has a behavior most users do not initially expect. Facts you just told the agent are stored as raw episodes immediately, but they may not appear in the consolidated tables (facts, relationships, etc.) until the worker has processed them. Recall against the raw store works in real time; recall against the structured store has a small lag. This is the right trade-off for write latency, but worth knowing before you debug a "missing memory" issue at three in the morning.
Should You Self-Host Stash?
Yes, if any of these apply: you handle data (PHI, EU customer data, financial records) where sending conversation history to a third-party AI service is a non-starter; you already operate Postgres in production and one more database is rounding error; you want to actually read the rows your agent's memory consists of for debugging or audit; you are an early adopter who is comfortable on v0.2.0 of an open-source project. The cost of trying it is one docker compose up and a config block, roughly twenty minutes including the two-session sanity test.
No, if any of these apply: you have no database operations capacity and are not interested in growing it; you specifically need the operational guarantees of a funded vendor with an SLA; you are running on the absolute edge with no infrastructure for a Postgres instance (in which case OMEGA's SQLite-only design is a better fit); you need published benchmark numbers right now to satisfy a procurement process.
The pattern is the same as every other self-hosted-vs-managed decision in modern infrastructure. Stash is not better than Mem0 in absolute terms, and Mem0 is not better than Stash. They are aimed at different buyers, and the right question is which buyer you are. If you have read this far, you are probably the self-hosted one. (For another piece of the same self-hosted-AI puzzle, see our coverage of the open-source text-to-CAD harness for Claude Code.)
Frequently Asked Questions
How do you give an AI agent persistent memory?
You attach a memory layer that stores facts in a database and exposes save and recall tools to the agent over an interface like MCP (Model Context Protocol). The agent writes during conversations and queries the store at the start of new sessions. Stash, Mem0, and Zep are three current implementations that take different positions on hosting and architecture.
Is Stash a real alternative to Mem0?
For teams that need self-hosting, yes — Stash sits in the open-source MCP-native quadrant that managed Mem0 does not directly serve. For teams that want a fully managed API and are fine with their conversation history living in someone else's cloud, Mem0 remains the more mature pick. The two products are aimed at different buyers, not the same buyer with different prices.
What is an MCP memory server?
An MCP (Model Context Protocol) memory server is a process that exposes save and recall tools to any MCP-compatible client (Claude Desktop, Cursor, Cline, Continue, OpenAI Agents, etc.) so the client can persist and retrieve facts across sessions without custom integration. Stash is one implementation; Memlord and OpenBrain are others in the same self-hosted quadrant.
Why use Postgres and pgvector instead of a dedicated vector database?
One database to operate, mature backup tooling, and transactional writes across structured tables and vector embeddings in the same place. The trade-off is scale ceiling — purpose-built vector DBs like Qdrant or Weaviate scale embeddings further. For an agent-memory workload measured in millions, not billions, of rows, pgvector is enough and the operational simplification is real.
The Bottom Line
Stash is small, opinionated, and exactly the kind of repository the open-source community produces best. It is a self-hosted, MCP-native, single-binary memory layer that takes infrastructure you already run (Postgres) and a protocol you already speak (MCP) and gives your agent persistent memory without sending a single conversation to a third party. It is not a replacement for Mem0 if you wanted Mem0, and it is not yet the benchmark champion of the category. It is the cleanest current answer to a specific question: how do I give my agent memory while keeping the data on my infrastructure?
Clone the repo, run docker compose up, wire it into Claude Desktop, and run the two-session sanity test. That twenty-minute exercise is the fastest way to know whether the self-hosted quadrant fits how you actually deploy AI.
Member discussion