This model isn't good enough yet. I'll wait for the next one.
I hear some version of this constantly and it's almost always wrong.
Not because the models are perfect. I know they're not.
But because "is the model good enough" blames the tech rather than yourself.
AI adoption isn't binary. It's not "can the agent do this entire job for me? Yes or no?"
It's "how much can it do, how much do I need to guide it, and where does it still make me faster than I was?"
Right now there's a massive gap in how people use this stuff.
On one end, someone is running a business by themselves that used to take a team of 15.
On the other end, someone opens the AI tool their company gave them, asks for some research, watches it hallucinate everything, closes it, and decides AI just isn't there yet.
If everyone has access to the same models, then why are we seeing people get drastically different outcomes?
Because if someone is getting great results from a setup you could copy today, the bottleneck isn't the model. It's the driver.
The way I think about it, there are three layers:
→ The model is the engine. Opus, GPT, Gemini, whatever you're running. Everyone can buy the same one.
→ The harness is the car built around that engine. Claude Code, Codex, OpenClaw. The tools it can reach, the way it spins up sub-agents to split up the work, the whole system that turns a raw model into something that can actually do a job.
→ You're still the driver. Your prompts. The context you feed it. The memory and skills you set up so it knows how you work. And the steering, for when it starts to drift.
You can put the car on cruise control. But if you don't steer, you're still going to crash. (Yeah, I know some cars have lane assist now. You get the point.)
A while back, Andrew Ng ran a version of this. GPT-3.5, an older and "worse" model, wrapped in a simple agentic workflow, hit around 95% on a coding test. GPT-4 on its own, no workflow, hit 67%.
That workflow is the harness. A better harness around an older engine beat a newer engine running on its own.
So when the results are bad, it's rarely just the model's fault. It's the prompt. It's the context you fed it. It's whether you ever taught the agent how you actually work.
This tech is still new and everyone is figuring it out in real time. I'm right there with you.
But the mindset that actually works is treating the model like something capable that still has to learn your processes. You guide it. You correct it. You build the system around it. And that takes reps.
The next model will be better. It always is.
It's just not going to steer the car for you.