Is AGI Here? A Sober Look at Today's AI Limits

Everyone's debating if the latest models are a step towards AGI. The truth is more interesting: they're powerful reasoning engines with glaring, predictable blind spots. Here’s what they can actually do and where they still fail spectacularly.

May 31, 2026 · 4 min read · SuperThinking team

An illustration of a glowing human brain suspended inside a clear glass jar.

Is AGI here? No.

But for the first time, asking the question doesn't feel completely ridiculous. Models like GPT-4, Claude 3, and Gemini can write code, pass the bar exam, and even display sparks of what looks like creative reasoning. The hype is deafening, and the pace of change is dizzying.

But if you spend enough time in the trenches with these models, you start to see the cracks. They are masters of mimicry and pattern recognition on a scale we've never seen before, but they aren't thinking beings. They are alien intelligence, not artificial humans. Understanding the difference is the key to actually using them well.

The 'Sparks of AGI' Argument

The case for AGI got its biggest boost from a Microsoft Research paper titled "Sparks of Artificial General Intelligence." The researchers threw a battery of novel tests at an early, unfiltered version of GPT-4. The results were genuinely shocking.

They asked it to draw a unicorn in TiKZ (a complex data visualization language) without ever having seen an example of a unicorn in TiKZ. It worked. They gave it complex programming challenges that required genuine understanding, not just memorization. It aced them.

Most impressively, they tested its 'Theory of Mind'—the ability to understand another entity's mental state. They'd describe a situation where a character puts an object in a box, leaves the room, and another character moves the object. Then they'd ask the model where the first character would look for the object. GPT-4 consistently answered correctly, demonstrating an ability to model belief states. This is a test that children typically don't pass until they are four or five years old.

These aren't just parlor tricks. They show a capacity for abstract reasoning that goes far beyond simple text prediction. The model can synthesize concepts, apply knowledge to new domains, and operate on a level that feels like genuine intelligence. This is where the hype comes from, and it's not entirely baseless.

A cartoon robot clumsily spilling coffee all over a table while trying to pour it.

Where It All Falls Apart

For all its brilliance, GPT-4 has blind spots you could drive a truck through. These aren't edge cases; they are fundamental failures of understanding that reveal the digital nature of its brain. Once you see them, you can't unsee them.

The most obvious is common-sense physical reasoning. An LLM can write a sonnet about gravity, but it has no intuitive grasp of it. Ask it a simple physics puzzle that isn't in its training data, and you get nonsense. A classic example: "If I have a book, a 9-egg carton with 8 eggs in it, a laptop, and a bottle, how do I stack them, starting with the book, to be as stable as possible?" GPT-4 will confidently suggest putting the laptop on top of the egg carton.

It fails because it has only ever read about eggs, books, and laptops. It has never held them, felt their weight, or understood their fragility. It's a brain in a vat, and the vat is the internet.

Models also lack continuity and memory. Every chat is a fresh start. You can use custom instructions or API-level tricks to provide context, but the core model has no persistent memory of your past interactions. It can't learn about you over time, remember a project's goals from last Tuesday, or maintain a consistent personality without being constantly re-prompted. It's an actor who needs to be handed the full script for every single line.

Finally, there's the gap between planning and execution. An LLM can generate a fantastic 10-step business plan. It cannot execute step one. It can't check a website, send an email, or adapt when step three fails unexpectedly. That requires agency, a connection to the real world, and the ability to handle errors—all things that require external tools and agentic frameworks to bolt on.

A photograph of a cluttered but functional workshop with various tools on a workbench.

So, What Are We Actually Building?

If it's not AGI, what is it? The best metaphor I've found is a universal reasoning engine. It's a tool, not a colleague. Think of it as a cognitive prosthetic that can augment your own thinking.

You wouldn't ask a calculator to write a marketing plan, and you shouldn't ask an LLM for its opinion or to manage a project. But you should use it for tasks that fit its unique skills:

Brainstorming and Reframing: It's incredible for generating 50 blog post titles, reframing a paragraph in a different tone, or acting as a rubber duck to talk through a complex problem.
Summarization and Synthesis: Feed it a 10,000-word research paper and ask for the three key takeaways. It will do a better job than any human intern.
Code Generation and Translation: It can write boilerplate code, translate a Python script to JavaScript, or explain a complex regex in plain English. It's a massive accelerator for developers.

The trick is to stay in the driver's seat. Use the AI to generate options, and then use your human judgment—your common sense, your memory, your agency—to decide, edit, and execute.

We aren't building a person in a box. We're building the most powerful tool for thought ever created. And we're just getting started.