Codex Startup Pressure Test Skill Tutorial: Brutally Vet a Startup Idea Before You Build
I have a folder of half-built MVPs. Each one started the same way: Saturday morning, fresh coffee, Codex CLI open, a scaffolded auth flow by 10am. Then somewhere around hour four — usually right before the dashboard route — I asked the question I should have asked first. Does anyone actually want this? The answer was sometimes "no," but by then I had spent the weekend.
The codex-startup-pressure-test-skill is the inversion of that workflow. It is a Codex CLI skill that lives at the front of the terminal session, not the back. You paste an idea, you get a verdict, and the verdict is structurally allowed to be pivot. The repo went up on 2026-05-03, picked up 692 GitHub stars and 62 forks in its first four days, and ships as a single npx install. This tutorial walks the full pattern end-to-end, including what each of the six invocation modes returns and when to escalate from the default mode to the deeper ones. (For an adjacent Codex skill walkthrough, see our keep-codex-fast tutorial.)
TL;DR:codex-startup-pressure-test-skillis a Codex CLI skill (MIT-licensed, ~8 KB JavaScript installer, 692 GitHub stars and 62 forks in its first four days as of 2026-05-07) that returns a brutal founder-style diagnosis of a startup idea across six modes:pressure-test,problem-validation,competition-map,first-10-customers,mvp-plan, andfull. Output is verdict (strong / weak / pivot), scorecard, core assumption, fatal flaws, problem reality, competition, first-10-customers plan, and a two-week MVP. Install in one command:npx --yes codex-startup-pressure-test-skill@latest.
What Is codex-startup-pressure-test-skill?
The codex-startup-pressure-test-skill is a Codex CLI skill that pressure-tests a startup idea and returns a compact founder-style diagnosis. It is MIT-licensed, roughly 8 KB on disk, and packaged as a JavaScript npx installer that drops a SKILL.md file plus an agents/ and references/ directory into ~/.codex/skills/startup-pressure-test/. The repo went public on 2026-05-03 and accumulated 692 GitHub stars and 62 forks in its first four days, which suggests "I built the wrong thing on a weekend" is a more shared experience than founders usually admit out loud.
The skill is designed for a specific user. You write code. You have a Codex CLI session open most days. You have a folder of half-finished side projects. You are good enough at building that the build step does not feel like the bottleneck, and you suspect the bottleneck is one step earlier. If two of those are true, the skill is for you. If you are a non-technical founder, an actual SaaS validator with a hosted UI is probably a better fit; the skill assumes a terminal.
The mental model is four lines:
Validate before you scaffold.
Brutality before you commit.
A two-week test before you build.
The verdict can bepivot.
Every other "AI startup validator" on the market collapses those four lines into one: encourage. That is what a viability score does. It is also what keeps the founder using the dashboard. The pressure-test skill keeps the four lines distinct and lets the human handle the part where you might decide not to build the thing.
How Does It Differ From SaaS Startup Validators?
Three structural differences separate the Codex skill from the SaaS validators. First, brutality is the default; the verdict can be pivot and there is no nudge to keep using the tool. Second, it lives in your developer environment, so the validation step happens in the same terminal where the build step would, which removes the context switch that lets founders pretend "I'll validate later." Third, the two-week MVP plan is the deliverable, not a roadmap deck. The leading 2026 alternatives — IdeaProof, ValidatorAI, FounderPal, aicofounder, and DimeADozen — are all hosted dashboards optimized for viability scores and TAM/SAM/SOM calculations.
Modern startup validation has moved from the older "Build-Measure-Learn" loop to an AI-augmented "Predict-Validate-Iterate" pattern, per WeArePresta's 2026 idea-validation guide. The SaaS tools sit at the "Predict" layer; they generate a confident-sounding score and call it validation. The Codex skill is closer to the older Y Combinator pattern: name the assumption, name the fatal flaw, ship a manual test in fourteen days, and let the market decide. Pre-orders and paid commitments remain the only validation that proves willingness to pay rather than willingness to express interest.
This is not "better" universally. If you are non-technical, the SaaS UI removes friction. If you are technical and your problem is that you keep building before validating, the terminal-native skill removes a different friction.
How Do You Install the Skill?
Two install paths exist. The fast path is the npx one-liner the README documents, which fetches the latest published version and copies the skill into your Codex skills directory. The manual path is a git clone followed by a directory copy, which I recommend for a first run because it pins to a specific commit and lets you inspect the SKILL.md file before any code touches your system. The repo is four days old at the time of writing, so a SHA pin matters more than a tag.
# Fast path: one-line npx install
npx --yes codex-startup-pressure-test-skill@latest
# Manual path: clone, pin to a known commit, copy
git clone https://github.com/Kappaemme-git/codex-startup-pressure-test-skill
cd codex-startup-pressure-test-skill
git rev-parse HEAD > .pinned-sha
mkdir -p ~/.codex/skills
cp -R startup-pressure-test ~/.codex/skills/startup-pressure-test
# Verify
ls ~/.codex/skills/startup-pressure-test
# Expected: SKILL.md agents/ references/
Restart Codex either way so the CLI rediscovers its skills directory. SKILL.md is a portable cross-agent standard per the OpenAI Codex skills documentation, which means the same folder works in Claude Code at ~/.claude/skills/ and in Cursor or Gemini CLI at their equivalent paths. The npx installer is Codex-specific because it hard-codes the destination; cross-agent use is a manual file copy.
Step 1: Run the Default pressure-test Mode
Open Codex and paste the README's verbatim invocation. The default pressure-test mode returns eight output sections — Verdict, Scorecard, Core Assumption, Fatal Flaws, Problem Reality, Competition, First 10 Customers, and MVP — with the verdict at the top as a single word: strong, weak, or pivot. There is no numeric viability score. There is no 0-100 scale. There is one of three states, and you act on it.
Use $startup-pressure-test to pressure-test this startup idea:
A tool that turns local videos into short clips with local
captions for indie hackers and creators posting product demos.
If you only invoke the skill without an idea, it asks for three things: the startup idea itself, the target customer, and what the customer should do or pay for. That last question is the one most founders skip. "Sign up" is not an answer. "Pay $19/month" is. The skill is opinionated about this and will not generate a useful diagnosis from a vague payment intention.
The first time I ran it, I fed in an idea I had been polishing for three weeks. The verdict came back weak. The fatal flaws section named the exact reason: my proposed pricing assumed annual contracts in a market segment that buys monthly. That was visible to anyone except me. Five minutes of terminal output saved a weekend.
strong, you skip straight there.Step 2: Escalate to Specialized Modes
If the default pressure-test flagged a single area as the highest risk, escalate to the matching specialized mode. The skill ships six modes total: pressure-test (the default, compact), problem-validation (real pain, early adopter, validation criteria), competition-map (current behavior, direct and indirect competitors, switching cost), first-10-customers (manual traction plan), mvp-plan (two-week MVP test), and full (the deep all-in-one report). Each mode is a verbatim prompt the README provides.
Use problem-validation when the default flagged the core assumption as soft. It returns a tighter analysis of who actually has the pain, which kind of person is the early adopter, and the specific validation criteria that would change your mind. Use competition-map when you cannot name what people do today instead of using your idea (the "do nothing" option is usually the strongest competitor and the one founders forget). Use first-10-customers when you cannot name your first ten by category, and use mvp-plan once you have committed to building.
Use $startup-pressure-test to validate whether this idea
solves a real problem people pay for: ...
Use $startup-pressure-test to map the real competition
for this idea: ...
Use $startup-pressure-test to find the first 10 customers
for this idea: ...
Use $startup-pressure-test to build a 2-week MVP plan
for this idea: ...
Use $startup-pressure-test to do a deep full report on
this startup idea: ...
The MVP mode is opinionated. It returns a manually-runnable two-week test, not a feature spec. Founder DMs to early adopters. A hand-built landing page. A pre-order or paid commitment as the success metric. No paid ads, no analytics setup, no automation — the skill explicitly preferences the things that prove demand over the things that look like product.
Step 3: Read the Verdict and Decide
The verdict is one of three states. strong means the assumptions hold and the right next move is to run mvp-plan and build the two-week test. weak means the idea has shape but at least one fatal flaw needs an answer before MVP — re-run with adjusted assumptions, or escalate to problem-validation. pivot means the underlying problem is not real, not solvable by you in this form, or is owned by an incumbent in a way you cannot dislodge. Take the verdict literally. The skill is not optimized to keep you using it.
pivot is allowed to be the right answer, and the structural absence of a "score" makes that easier to act on.The anti-pattern is obvious once you have done it once: re-running the prompt with softened wording until the model returns strong. Do not do that. The verdict reflects the prompt you actually fed it, and softening the prompt softens the diagnosis. If the first verdict was weak and you cannot describe the fatal flaw in one sentence, you have not earned strong by rewording — you have just given yourself permission to keep going.
The strongest test of the skill is to run it against an idea you previously shipped and know the outcome of. Pick one that worked and one that didn't. Compare the verdict to the actual market result. The skill won't be right every time, but the pattern of what it would have warned you about is more honest than your retrospective memory.
Limits and Open Questions
Three honest limits worth flagging. The repo is four days old as of writing, with no production track record yet. Verdict quality is whatever the LLM behind your Codex session can produce on the structured prompts the skill provides; on a weaker model you will get a weaker diagnosis. Pin to a commit SHA on the first install and accept that the CLI surface may change.
The skill is also Codex-CLI-specific in its installer. SKILL.md itself is portable across Claude Code, Cursor, Gemini CLI, and the rest of the cross-agent ecosystem documented in catalogs like awesome-agent-skills, but the npx command writes to ~/.codex/skills/ by hardcoded path. Cross-agent use is a manual copy.
Most importantly, the skill validates the idea, not the founder. A strong verdict on a good idea does not mean you are the right person to build it — distribution access, domain expertise, and capital runway are out of scope. Use first-10-customers mode as a partial fix; if you cannot name them, the founder-fit gap is real, regardless of the verdict on the idea itself. (For an adjacent terminal-native AI workflow that also depends on founder access, see our Claude Ads PPC audit tutorial, and for a similar opinionated-skill pattern aimed at non-developers, our Charlie Hills social media skills review.)
Frequently Asked Questions
What is the codex-startup-pressure-test-skill?
It is a Codex CLI skill (MIT-licensed, ~8 KB JavaScript installer, 692 GitHub stars and 62 forks in its first four days as of 2026-05-07) that returns a brutal founder-style diagnosis of a startup idea across six invocation modes: pressure-test, problem-validation, competition-map, first-10-customers, mvp-plan, and full. Output includes a verdict (strong, weak, or pivot), a scorecard, the core assumption, fatal flaws, problem reality, competition map, a first-10-customers plan, and a two-week MVP test. Source: github.com/Kappaemme-git/codex-startup-pressure-test-skill.
How do I install the codex-startup-pressure-test-skill?
Run npx --yes codex-startup-pressure-test-skill@latest. The installer drops the skill into ~/.codex/skills/startup-pressure-test/ as SKILL.md plus agents/ and references/ folders. Restart the Codex CLI so it discovers the new skill, then verify with ls ~/.codex/skills/startup-pressure-test. The manual install path is to git clone the repo and copy the startup-pressure-test/ directory into ~/.codex/skills/. Source: project README.
How is the skill different from SaaS startup validators like IdeaProof or ValidatorAI?
It has no dashboard, no numeric viability score, and no retention loop. The verdict is one of three states (strong, weak, or pivot), and pivot is allowed to be the right answer. The skill lives in the same Codex CLI you would use to scaffold the project, which removes the context switch between validation and build. Leading 2026 SaaS validators (IdeaProof, ValidatorAI, FounderPal, aicofounder, DimeADozen) center on viability scores and TAM/SAM/SOM dashboards.
Can I use this skill in Claude Code or Cursor instead of Codex?
Yes, with a copy step. SKILL.md is a portable cross-agent standard documented in the OpenAI Codex skills docs and supported by Claude Code, Cursor, Gemini CLI, and several other coding agents. Copy the startup-pressure-test/ folder into ~/.claude/skills/ for Claude Code or the equivalent skills directory for your agent. The npx installer specifically targets ~/.codex/skills/, so cross-agent use is a manual file copy.
What does the 2-week MVP plan from mvp-plan mode actually look like?
It is a manually-runnable test, capped at fourteen days, designed to invalidate the idea cheaply rather than build a polished product. Expect founder DMs to early adopters, a manual onboarding flow you run yourself, a single hand-built landing page, and pre-orders or paid commitments rather than email signups. The README's mvp-plan mode is explicit that the output is not a product roadmap or feature spec; it is a falsifiable two-week experiment with a defined success and failure threshold.
The Bottom Line
The build-first reflex is the most expensive habit in indie hacking. Saturday afternoons spent scaffolding auth flows for products nobody asked for, weekend sprints for ideas that pre-died on the second customer call, GitHub repos with twelve commits and zero users — every one of those started with a missing twenty-minute conversation that should have happened before the editor opened. The codex-startup-pressure-test-skill is the first public approach that drops that conversation into the same terminal where the build would happen, and then has the discipline to let the verdict be pivot.
Install with npx --yes codex-startup-pressure-test-skill@latest, run pressure-test on the idea you have been mentally polishing for the last month, and treat the verdict literally. If it comes back strong, escalate to mvp-plan and ship the two-week test. If it comes back weak or pivot, write down what the fatal flaws section said and put the editor away for the night. (For more open-source AI tooling on the same axis of "local-first, no vendor lock-in," see our coverage of the keep-codex-fast tutorial, the Matt Pocock dictionary of AI coding, and the Synthetic Claude Code drop-in review.)
Member discussion