Editorial note: Star counts, issue counts, and cost figures in this article are snapshots taken on May 28, 2026, the day PilotDeck was open-sourced, and will move as the project ships. Architectural claims and install steps come from the public PilotDeck repository; verify against the latest README before deploying. The 70% and one-sixth-cost figures are OpenBMB's own numbers and have not yet been independently benchmarked. We flag that wherever it matters.

AI token prices now vary by up to 250x between models. Claude Opus 4 runs about $75 per million output tokens; Google's Gemini 3 Flash costs roughly $0.30 for the same volume (ClawRouters, 2026). Yet most agent setups send every task to the same flagship model: drafting a tweet, parsing a date, reasoning through a hard refactor, all billed at the top rate. That is the expensive default PilotDeck is built to break.

PilotDeck open-sourced on May 28, 2026, and it does not call itself a framework. It calls itself an "agent operating system." This review covers what that actually means. We walk the three capabilities behind the pitch, how to install it, where it sits against agent frameworks like Mastra and Letta, and whether it's worth your time at version one. If you have read a dozen "cut your LLM bill" guides this year, this is the first shipping product that bakes the advice into a workbench you run yourself.

TL;DR: PilotDeck (github.com/OpenBMB/PilotDeck, AGPL-3.0) is an open-source agent operating system from OpenBMB and Tsinghua THUNLP, released May 28, 2026. It isolates each project into a WorkSpace, exposes editable "white-box" memory, and routes simple tasks to cheaper models. OpenBMB claims 70% cost savings on social-media workflows and frontier quality at one-sixth the cost. Install in one line; runs locally; you bring your own keys.


What Is PilotDeck?

PilotDeck dark-mode dashboard showing three isolated project WorkSpaces, each with its own files, memory, and activity
Each project lives in its own WorkSpace, with isolated files, memory, and skills.

PilotDeck is an open-source agent operating system released on May 28, 2026 under AGPL-3.0, jointly built by OpenBMB, Tsinghua University's THUNLP lab, ModelBest, and AI9Stars (PilotDeck repository, 2026). The organizing idea is the WorkSpace. Each project gets its own isolated file system, memory store, and skill set, so two tasks running side by side never bleed into each other's context.

That framing is a deliberate departure from how most agent tools talk about themselves. A framework gives you primitives like chains, tools, and a memory class, then asks you to assemble the runtime yourself. PilotDeck ships the runtime. It is closer in spirit to a desktop environment for agents than to a library, which is why "OS" is doing real work in the name rather than just sounding impressive.

OpenBMB is not a newcomer. The same ecosystem produced the MiniCPM model family and the earlier XAgent autonomous-agent project, so PilotDeck arrives with a track record behind it. At launch it carried roughly 1.4k GitHub stars and 15 open issues. That's small. But the lineage signals a serious release, not a weekend experiment.

The codebase tells you who it's for. PilotDeck is 69.6% TypeScript and 23.1% JavaScript, built on Vite, React, Tailwind, and shadcn/ui, with only a thin Python layer. This is a web-first product with a real UI, not a research repo you drive from a notebook. For more on the broader self-hosted agent stack, see our look at open-source autonomous agent frameworks.


How Does PilotDeck Save AI Costs?

PilotDeck's Smart Routing auto-detects task difficulty and sends simple work to lighter models. OpenBMB's own benchmark puts a social-media workflow at $2.83 with routing on versus $12.58 with routing off, roughly a 70% saving. On hard tasks, it reports frontier-matching quality at one-sixth the cost ($3.15 versus $18.36) (PilotDeck repository, 2026). The mechanism is on-device and cloud model co-orchestration: a cheap model handles the easy 80%, a flagship handles the hard 20%.

Why does this matter so much? Because LLM API calls account for 70–85% of total agent operating cost (NiteAgent, 2026). When the single biggest line item is model inference, and you're overpaying by routing trivial tasks to a $75-per-million model, the savings from matching difficulty to price compound fast across a long-running agent.

Independent research backs the size of the prize. ClawRouters' routing scenarios show cost reductions ranging from 67% to 92% depending on the workload, with little quality loss (ClawRouters, 2026). OpenBMB's 70% figure sits squarely inside that range, which makes it plausible even before anyone re-runs the test. The honest caveat: "plausible" is not "verified," and the saving is workflow-specific.

Agent Workflow Cost: Single Flagship vs. Smart Routing A horizontal bar chart showing that a social-media workflow on a single flagship model costs 100 indexed units, while the same workflow under PilotDeck Smart Routing costs about 30 units, a 70% reduction. Agent Workflow Cost: Single Flagship vs. Smart Routing Indexed cost, social-media workflow (single flagship = 100) Single flagship model 100 PilotDeck Smart Routing 30 (−70%) Source: OpenBMB / PilotDeck repository, 2026 (vendor figure)
PilotDeck's headline cost claim, visualized. The 70% figure is OpenBMB's own.

Here is the part the cost-optimization guides skip: routing only pays off if the difficulty detector is good. Misroute a hard task to a cheap model and you don't save money. You get a wrong answer you have to redo on the expensive model anyway, paying twice. The quality of PilotDeck's router is the whole game, and it's exactly the thing a launch-day star count can't tell you.

The model price spread that makes routing worth it is dramatic on its own.

Output Token Price per Million Tokens (2026) A lollipop chart showing output token prices: Gemini 3 Flash at $0.30, a mid-tier model at about $3, GPT-4o-class at about $10, and Claude Opus 4 at $75 per million output tokens, a roughly 250x spread. Output Token Price per Million Tokens (2026) Up to a 250x spread between the cheapest and priciest models Gemini 3 Flash $0.30 Mid-tier (e.g. Haiku-class) ~$3 GPT-4o-class ~$10 Claude Opus 4 $75 Source: ClawRouters, 2026 (illustrative tiers)
When the priciest model costs 250x the cheapest, sending every task to the flagship is the real waste.

What Is White-Box Memory?

PilotDeck white-box memory editor showing individual memory entries with confidence bars and edit, pin, and delete controls
White-box memory: every entry is visible, editable, and reversible, with idle-time consolidation on the right.

White-box memory means PilotDeck makes memory generation, storage, and retrieval fully visible end-to-end, so you can edit, delete, or pin individual entries by hand (PilotDeck repository, 2026). When the agent misremembers something, you don't restart the conversation and pray. You open the memory store, find the offending entry, and fix it directly.

This is the inverse of how most production memory works. In 2026, the dominant pattern keeps memory in a dedicated layer outside the prompt, written and retrieved by tools like Mem0 or Zep that the user never inspects (Mem0, 2026). That's efficient, but it's a black box: when recall goes wrong, you can't see why, and you certainly can't reach in and correct one bad fact.

PilotDeck adds a "Dream Mode" that consolidates memory during idle windows, compacting raw observations into structured facts the way sleep is theorized to consolidate human memory. If a consolidation pass makes things worse, one click rolls it back. That rollback button is the tell. It says the designers expect consolidation to sometimes be wrong and built the undo before you needed it. For a contrasting take on inspectable memory architecture, see our review of self-hosted agent memory layers.


What Does "Always-On" Mean?

Always-on breaks the ask-and-answer loop: after you sign off, PilotDeck's agents keep discovering candidate tasks, run long-horizon monitors, and land finished deliverables as local files with a summary report waiting when you return (PilotDeck repository, 2026). The default agent stops the moment you stop typing. This one doesn't.

That maps onto where the market is heading. Through 2026, enterprise teams have pushed agents out of pilots and into production, where the job is no longer answering a question but doing standing work. Production use means agents that monitor, schedule, and act between human check-ins. They are not chatbots that wait for the next prompt.

The trade-off is real and worth stating plainly. An agent that works while you're gone is an agent spending tokens and taking actions you didn't directly approve. Always-on amplifies both the upside (work gets done overnight) and the downside (a misrouted or misremembered task runs unsupervised). PilotDeck's white-box memory and routing controls are partly what make always-on tolerable: you can audit what it did and why. Pair this with proper observability; our guide to open-source LLM observability covers the tooling.


How to Install PilotDeck

PilotDeck installs in one line on macOS or Linux, or via Docker Compose, with the web UI served on localhost:3001 (or localhost:5173 in dev mode). Because it runs locally, your data and conversation history stay on your machine (PilotDeck repository, 2026). The one-liner is the fastest path:

curl -fsSL https://raw.githubusercontent.com/OpenBMB/PilotDeck/main/install.sh | bash

A word of caution before you paste that: piping a remote script straight into bash executes whatever the URL serves, with your user's permissions. It's a common install pattern, but read the script first. curl the URL to a file, inspect it, then run it. That's good hygiene for any curl | bash installer, not a knock on PilotDeck specifically.

After install, you configure model API keys (PilotDeck brings no keys of its own; you supply Claude, GPT, or other provider credentials), then create your first WorkSpace. Each WorkSpace is the isolation boundary: its files, memory, and skills are walled off from every other project. For teams building portable, self-hosted AI setups, this pairs naturally with approaches in our portable Claude tutorial.

One licensing flag for businesses: PilotDeck is AGPL-3.0. Self-hosting for internal use is unrestricted, but if you offer a modified PilotDeck as a network service to others, the copyleft terms require you to release your source changes. Plan around that before you build a product on top of it.


PilotDeck vs. Agent Frameworks Like Mastra and Letta

PilotDeck competes on a different axis than agent frameworks: it's an agent operating system focused on running and observing agents, where Mastra, Letta, and LangGraph are frameworks focused on orchestrating them. Mastra alone spans 2,436+ models across 81 providers through the Vercel AI SDK (Firecrawl, 2026). That's enormous orchestration breadth, but you still build and host the runtime yourself.

That distinction is the thing most "best agent tools 2026" listicles miss. They rank PilotDeck-style products and framework libraries in one flat list, as if they were interchangeable. They aren't. A framework is what you build with; an agent OS is what you run agents on. You might well use both: orchestrate with a framework, operate inside an OS.

Capability PilotDeck Mastra Letta
Category Agent OS (runtime + UI) Framework (TypeScript) Framework (memory-first)
Memory model White-box, hand-editable Four-tier (history, working, semantic, RAG) Tiered, agent-managed (MemGPT lineage)
Cost routing Built-in Smart Routing Provider-agnostic, manual Manual
Always-on execution Native Build it yourself Build it yourself
Primary interface Web UI (localhost) Code / SDK Code / SDK
License AGPL-3.0 Permissive (Apache/MIT) Apache 2.0

Read that table as positioning, not a scoreboard. If you want maximum model coverage and full programmatic control, a framework wins. If you want a self-hosted workbench where memory is inspectable and cost routing comes for free, PilotDeck is offering something the frameworks deliberately leave to you. Different jobs.


Is PilotDeck Worth It? The Verdict

At launch, PilotDeck is a genuinely interesting bet with launch-day caveats. The strengths are real: transparent, editable memory solves a debugging pain that black-box stores create; Smart Routing targets the 70–85% of agent cost that actually matters; and it's self-hosted with an active community across Discord, Feishu, and WeChat. The category framing, an OS rather than a framework, is the clearest articulation of "operate your agents" I've seen ship.

The cons are the cons of any v1. Roughly 1.4k stars and 15 open issues mean small community and unbattle-tested edges. The AGPL-3.0 license adds friction for anyone building a commercial service on top. And the headline numbers (70% savings, one-sixth cost) are OpenBMB's own, not independent benchmarks. They're plausible given third-party routing research, but "plausible" is where the evidence currently stops.

Our take: If you run agents across multiple projects and have been burned by black-box memory or surprise token bills, PilotDeck is worth an afternoon of install-and-test today. If you need production stability or commercial licensing clarity, watch it for a release or two and let the benchmarks catch up to the claims.

The single most useful thing you can do is the test OpenBMB's marketing can't do for you: install it, run one real workflow with Smart Routing on and off, and measure your own cost delta. That number is yours, not the repo's, and it's the only one that decides whether PilotDeck belongs in your stack. For broader agentic workflow patterns, our Claude Code agentic workflow guide is a useful companion.


Frequently Asked Questions

Is PilotDeck free?

Yes. PilotDeck is released under AGPL-3.0, so it's free to download and self-host. The copyleft terms matter for commercial use: if you offer a modified version of PilotDeck as a network service, AGPL-3.0 requires you to publish your source changes. For internal self-hosting, there are no such restrictions.

Who made PilotDeck?

PilotDeck was jointly developed by OpenBMB, Tsinghua University's THUNLP lab, ModelBest, and AI9Stars, and open-sourced on May 28, 2026. The same ecosystem produced the MiniCPM model family and the earlier XAgent project, so it arrives with established research backing rather than as a first-time effort.

Does PilotDeck really save 70% on costs?

OpenBMB reports 70% savings on social-media workflows via Smart Routing and frontier-level quality at one-sixth the cost on complex tasks. These are vendor figures, not independent benchmarks. But they fall inside the 67–92% routing-savings range that third-party scenarios report (ClawRouters, 2026), making them plausible pending verification.

What models does PilotDeck support?

PilotDeck routes between flagship models (Claude and GPT-class) for hard tasks and lighter models for simple ones, using on-device and cloud co-orchestration. You bring your own API keys. Because routing matches model price to task difficulty, the difficulty detector's accuracy drives the real-world savings you'll see.

How is PilotDeck different from XAgent?

XAgent is OpenBMB's autonomous agent for solving a single complex task end-to-end. PilotDeck is a productivity operating system for running many agents across isolated WorkSpaces, with explicit memory management, cost routing, and always-on execution. XAgent solves one task; PilotDeck runs your whole workbench.


The Bottom Line

PilotDeck is the first open-source product to treat "operate your agents" as a category of its own. Three things make it worth watching:

  • White-box memory turns a black-box debugging nightmare into an editable text store with undo.
  • Smart Routing attacks the 70–85% of agent cost that's pure model spend, with a claimed 70% reduction.
  • Always-on execution matches where teams are already heading: agents in production doing standing work, not chatbots waiting for prompts.

The caveats are equally clear: it's v1, AGPL-3.0 carries commercial friction, and the cost claims await independent confirmation. Install it, run one real workflow, and measure your own savings. Then decide. Start at the PilotDeck repository, and if you measure a routing-savings number of your own, that's the data point worth sharing.