Have We Hit AGI? A Reality Check on GPT-4 and Claude 3

AGI talk is everywhere, but the hype is outpacing reality. Here’s a practical breakdown of what today's AI models can actually do, where they fail, and a simple test you can run yourself.

May 4, 2026 · 4 min read · SuperThinking team

A metallic robot head tilted sideways, looking curiously at a small green potted plant.

No, we have not achieved Artificial General Intelligence. Not even close.

It feels strange to have to say it so bluntly, but the hype cycle is spinning out of control. Every new model release is met with breathless claims that we’re on the verge of conscious, thinking machines. We’re not.

What we have are incredibly sophisticated pattern-matching engines. Think of them as calculators for words. They can predict the next most plausible token in a sequence with astonishing accuracy, which creates a very convincing illusion of understanding. But an illusion is all it is.

Real AGI means an intelligence that can learn, reason, and apply knowledge across a wide range of tasks, just like a human. It wouldn't need a specific prompt. It would have goals, curiosity, and a model of the world. GPT-4 doesn't have a model of the world; it has a model of text about the world.

What Models Can Actually Do

Let’s be clear: today's frontier models are miraculous tools. If you told me five years ago that I could type a paragraph and get back a functioning Python script, a 12-line poem, or a marketing plan, I’d have laughed you out of the room.

They are phenomenal accelerators for human thought. You can use them to:

  • Brainstorm: "Give me 20 names for a podcast about vintage synthesizers."
  • Summarize: "Condense this 3,000-word SEC filing into five bullet points."
  • Write boilerplate code: "Write a javascript function that takes an email address and returns true if it's a valid format."
  • Reformat data: "Turn this messy, comma-separated text into a clean JSON object."

Here’s a real example. I needed a quick regular expression to find all hexadecimal color codes in a CSS file. I’m rusty with regex.

# My prompt to Claude 3 Sonnet
Give me a regex to find hex color codes like #fff or #ea445a in a string. Just the regex, no explanation.

# Claude's response
#[a-fA-F0-9]{6}\b|#[a-fA-F0-9]{3}\b

That took me 10 seconds. It would have taken me 10 minutes of searching Stack Overflow. This is the magic. It’s a force multiplier for people who already know what they’re doing. It’s a tool, not a colleague.

A whiteboard covered in a chaotic web of arrows, boxes, and handwritten notes.
A whiteboard covered in a chaotic web of arrows, boxes, and handwritten notes.

The Real Gap: Where They Fall Apart

The difference between a great tool and a real intelligence lies in a few key areas where today's models consistently fail.

First, they have no persistent memory or capacity for genuine learning. Each chat is a self-contained universe. The model doesn't "remember" you from one conversation to the next. The massive context window is a clever workaround, not a solution. It's like giving a person a perfect transcript of your last conversation instead of them actually remembering it. A human learns and integrates knowledge permanently; an LLM just has more temporary data to reference.

Second, they lack true causal reasoning. They can tell you that flicking a light switch causes the light to turn on, because they've read that sentence a million times. But they can't reason from first principles about electricity, circuits, or filaments. They mimic reasoning by finding patterns in data. Ask one a novel logic puzzle that isn’t in its training set, and you’ll see it stumble, often in hilarious ways.

Finally, and most importantly, they have zero understanding of the physical world. They have no embodiment, no senses, no common sense grounded in reality. An LLM can write a beautiful paragraph about the taste of a fresh strawberry, but it has no idea what a strawberry actually is. It doesn't understand gravity, friction, or that you can't fit a watermelon in a teacup. This grounding is the foundation of human intelligence.

A Simple Litmus Test You Can Run

Don't take my word for it. You can see these limitations yourself. The key is to give the model a task that requires planning, maintaining constraints, and integrating different types of information—things a human does easily.

Try this prompt:

My sister is a vegetarian who loves sci-fi movies and hates cilantro. Plan a simple, low-key birthday dinner for her. Give me a three-course menu (no cilantro!), a list of 5 sci-fi movies we could watch, and draft a casual text message inviting her over. Make sure the text mentions the movie part.

Now, look at the output carefully. Did it manage to keep the "no cilantro" constraint across the entire menu? Did it suggest a movie that is famously not sci-fi? Did the text message it drafted sound natural, or did it hallucinate details you never provided? Often, one part of the output will forget the constraints set in another. It's executing a series of text predictions, not performing a single, cohesive planning task.

A simple calculator sitting on a desk next to an open, dense textbook.
A simple calculator sitting on a desk next to an open, dense textbook.

So, What Are We Working With?

Thinking of these models as budding AGIs is a trap. It leads to misaligned expectations and, frankly, bad products. If you build a system assuming the AI "understands," you're going to have a bad time when it confidently hallucinates a legal precedent or a medical diagnosis.

The real danger isn't Skynet. It's a hospital administrator who trusts a buggy, non-AGI system to manage patient schedules, or a programmer who blindly copy-pastes flawed code without understanding it.

We have a powerful new kind of tool. It's a text-generator, a summarizer, a code-suggester, a brainstormer. It's a way to manipulate and generate symbolic information at an unprecedented scale.

Forget waiting for AGI. Focus on what's real today. Build smart workflows that use these models for what they're good at, with a human always in the loop to handle the parts that require actual intelligence.