You have to be 10x better than what your customers even dream is possible in order to sell SaaS these days. Otherwise they’re thinking: “I can vibe-code this in a weekend with exactly the features I need and it’ll integrate better with my stack.” Or: “There’s an open source version — I’ll fork it.”
So what does 10x actually look like? Being very simple and opinionated, in a good way. Like:
- choosing to show a minimal summary auto-discovered and annotated errors from LLM traces, instead of showing a busy default view with every trace in the system like many AI observability platforms currently do
- requiring the user to connect their source code, logs, and auto-instrumentated telemetry so the AI can actually identify and fix incidents for you.
- being 5-10x faster at building Docker images with drop-in compatibility
Having really good design, strong opinions, and shipping fast - that shows the customer that you’re a leader in the space, and that you’ll solve their problem in the easiest way possible.
It’s not enough to just have one of these - I’ve read many highly useful, opinionated blogs, docs, and benchmarks only to test out the product and decide not to buy it. You need it all to build a 10x product in the age of vibe-slopped and vibe-ported open source alternatives.
Be opinionated
-
Know what you’re talking about. You need complete information. LLMs can write research reports but the best kind is experience - you’ve done the thing, found an unexpected problem, learned the lesson. These timelines compress if you’re in an LLM-able domain, you can complete mini projects in days, but they may still be long for others.
-
Talk to people with good opinions. With LLMs it’s easy to be overwhelmed with decisions at each turn. It can feel like you’re making more important decisions more frequently than back in the days of manual coding. That’s probably not true - you just made most decisions implicitly without thinking about every alternative. But there’s plenty of decisions you would have overlooked that are now visible and actually worth the extra time to think through. You need to know what’s fine to decide on your own, and when you should discuss with others. Debate with the LLM to understand options and tradeoffs - sometimes that’s enough to decide right there. But other times the real question is priorities: scalability and flexibility, or shipping fast, or being simple, or minimizing maintenance cost? Sometimes all of these are at odds. Get trusted input from your team - especially people who’ll have to deal with your code if you move on. I firmly believe that with LLMs, big architectural decisions can and do happen multiple times per day - it’s worth discussing in person to maintain a high rate of confident decision-making.
-
Just do it, then check the data. Ship the thing, measure, and evaluate against your prediction. Calibrate and update your beliefs. Over the long run, you develop good priors.
Shipping fast with replicas + guardrails
Have a dev environment with copies of your services in Kubernetes so you can test infra changes by running the agent with Terraform/kubectl in a loop. This gives you the fastest iteration time and makes sure the final change actually works, because you’re applying real resources. Have a staging environment in your cloud. Same thing for any external services like Cloudflare, Planetscale, Neon, Supabase, etc.
But then you need a guardrail: only allow code/PRs into production with evidence the change worked in your replica environment, plus infrastructure as code so it’s reproducible - not a bunch of API commands.
You can take this further and give every developer access to their own mini environment so that “testing on local” can be just as good as “testing on prod.” Some external services might discourage / not allow this - in which case that’s a big pro for hosting your own infra.
Dev tools are a superpower
Dev tools take on a new meaning with agents. Obviously you want your agents to have access to an API/CLI/MCP for all your SDLC tools for builds, tests, deploys, and observability, and your dev knowledge work tools. You also want:
- agent-browser (screenshots, recording, network requests, etc)
- A database with full historical snapshots and branching, you can run many rollouts to benchmark or train an agent to use your tools better.
- Better production error/performance recording, fault injection, and deterministic replay tools
- Good benchmarking/performance tools
- Store agent transcripts, make them searcable. Share them with your team (agentsview or agentlogs). Have an easily shared skills repository
- Give designers good sharing and visualization tools.
We’ve only scratched the surface of tool quality. The best ones will encode domain knowledge and recommended workflows into the system, e.g. requiring the agent to access certain context before taking an action.
How to verify
- Infra — bring up, check state is expected / do some load tests, bring it down
- UI — browser agent + recording + summary
- Code — e2e tests, property testing
- Complex logic — formal verification
Most verifiable: code, performance, revenue. Least verifiable: design, writing.
The tough thing with design and writing is that I’m quite opinionated about both, but I have too little experience to actually produce super high-quality output in either, or to prompt a model to. I feel I have a good discerning sense (classifer) based on all the writing and software I’ve engaged with over time.
A big reason models aren’t good at these domains is that current post-training doesn’t encourage opinionated output. The Bradley-Terry preference ranking tends to produce average outputs that a large group of people judge to beat other outputs. We’d much rather care about the preferences of a small tasteful group than everyone on the Internet or a human data project.