AGI Reality Check: What Today's AI Can and Can't Do

Everyone's debating if we've reached AGI. The answer is no, but that's the wrong question. Here's a practical look at what models like GPT-4 and Claude 3 can *actually* do, where they fail, and how to think about them.

May 30, 2026 · 4 min read · SuperThinking team

An illustration of a glowing brain made of interconnected digital pathways and nodes.

No, AGI isn't here. Let's just get that out of the way.

Defining Artificial General Intelligence is slippery, but a decent working definition is an AI that can perform any intellectual task a human can. Think, learn, plan, and understand—not just mimic. We are not there.

But arguing about the AGI timeline misses the point. The real story is that today's frontier models, like GPT-4 and Claude 3 Opus, have developed genuinely new capabilities that feel like a step-change from just two years ago. The question isn't "is it sentient?" but "what can I do with this that was impossible before?"

The "Sparks" Are Real

If you haven't felt a flicker of awe using these tools, you're not paying attention. The capabilities are startling, even if they aren't true intelligence.

First, there's genuine multi-modal reasoning. You can now upload a screenshot of a messy chart from a PDF and ask, "What's the key takeaway from this data for Q3?" The model doesn't just see pixels; it interprets the chart's structure, labels, and data points to synthesize an answer. This isn't just OCR—it's interpretation.

Second, they can follow complex, multi-step instructions with nuance. It's no longer about simple prompts. You can give it a persona, a complex goal, a set of constraints, and a specific output format, all in one go.

For example:

Act as a senior product manager. Review the following user feedback (pasted below). Identify the top 3 recurring feature requests. For each request, write a one-paragraph summary, estimate the engineering effort on a t-shirt scale (S, M, L), and suggest one potential pitfall. Format the entire output as a single JSON object.

An older model would get lost. Today's models nail it. They can hold the entire complex context in their metaphorical heads and execute the task.

And the code generation is finally good enough to be a daily driver. It's not just about spitting out boilerplate. You can describe a novel problem, provide your existing (and messy) codebase for context, and have it write a new function that actually works. It understands the intent behind the code.

A chaotic jumble of multi-colored computer wires and cables, representing complex but brittle systems.

These are not parlor tricks. They are force multipliers for anyone who thinks for a living.

Where The Magic Fades

For all their power, these models are still brittle simulacra of intelligence. Their failures are just as revealing as their successes.

The biggest giveaway is their shocking lack of common sense. An LLM can write a sonnet about quantum physics but might fail a basic logic puzzle that a child could solve. They have no grounding in the physical world, no lived experience. Everything they "know" is an abstraction derived from text. This leads to confident, plausible, and utterly wrong answers—the dreaded hallucinations.

They also have no persistent goals or agency. An LLM is a stateless machine. It responds to your prompt and then forgets everything. It cannot decide on its own to pursue a long-term project, manage a budget, or even remember what you talked about yesterday without you explicitly re-providing the context. You are the agent; the model is the (very sophisticated) tool.

This leads to the third major failure point: strategic planning. A model can generate a list of steps for a business plan. It cannot, however, execute that plan. It can't adapt to unforeseen market changes, negotiate with a supplier, or notice that a key team member is burning out. It can't self-correct its own strategy based on real-world feedback.

A magnifying glass hovering over a detailed architectural blueprint, symbolizing focused, task-specific intelligence.

When the task requires reasoning beyond the provided context or interaction with a dynamic, unpredictable world, the illusion of intelligence shatters.

A Better Frame: Task-Specific Superintelligence

So, if we're not at AGI, where are we?

We're entering an era of what you might call Task-Specific Superintelligence. Instead of one AI that does everything, we're building models that are superhuman at specific, narrow-but-valuable intellectual tasks.

A model fine-tuned on medical imaging might outperform the world's best radiologist at spotting tumors. That's not AGI, but it's a revolutionary tool for medicine.

An AI system can review ten thousand pages of legal discovery documents for relevant clauses in minutes, a task that would take a team of paralegals weeks. That's not AGI, but it fundamentally changes the economics of law.

A coding assistant can refactor an entire legacy codebase, update dependencies, and write comprehensive unit tests with a single command. It can't invent a new programming language, but it makes every developer ten times more productive.

This is the correct way to think about the current moment. Stop waiting for a Skynet-style general intelligence. Instead, look at the specific, high-leverage intellectual tasks in your own workflow. There's a good chance an AI can now do a piece of that task better, faster, and cheaper than any human can. Your job is to become the person who knows how to wield these powerful new tools.