Is AGI Here? A Reality Check on GPT-4 and Gemini

Everyone's talking about AGI, but the latest models still fail basic reasoning tests. Here's a practical look at what AI can actually do today, and why 'capability' is a much better frame than 'intelligence'.

June 1, 2026 · 5 min read · SuperThinking team

An intricate, glowing brain made of interconnected wires and circuits, symbolizing artificial intelligence.

No. AGI is not here.

It’s not even close. The breathless demos and CEO keynotes are selling a sci-fi dream that the underlying tech doesn't support. The models are getting shockingly fluent, multimodal, and fast. But they are not thinking. They are not reasoning. They are not conscious.

They are autocomplete on steroids. And that’s fine! Autocomplete on steroids is an incredibly useful tool. But let’s be clear about what we’re actually working with.

What Models Actually Do

Underneath the hood, models like GPT-4o and Gemini 1.5 Pro are statistical pattern-matching machines. They are trained on a monstrous amount of text and images from the internet, and their entire goal is to predict the next most likely word (or pixel) in a sequence. When you ask a question, it's not "understanding" your intent. It's calculating a probabilistic path through its training data to generate a plausible-sounding response.

This is why they are amazing at tasks that rely on remixing existing information.

For example, you can give it messy notes and ask for a structured JSON output. This is a classic pattern-matching task.

Take these meeting notes and turn them into a JSON object with keys: 'project_name', 'participants' (an array of strings), and 'action_items' (an array of strings).

Notes:
Project is codenamed Titan. Sarah, Ben, and Chloe were on the call. Ben needs to follow up with the design team by Friday. Sarah is going to draft the initial spec. Chloe will set up the next meeting.

The model will nail this 99% of the time because it's seen countless examples of meeting notes and JSON. It's a glorified copy-paste-reformat job, and machines are great at that.

But they fall apart on tasks that require true reasoning or understanding of the world. Consider this classic logic puzzle:

I have a bag of 5 red balls and 3 blue balls. I draw one ball out, and it's red. I do not put it back in. What is the probability that the next ball I draw is red?

A modern LLM will likely get this right (4 red balls left out of 7 total, so 4/7). It's a common enough problem that the solution pattern is probably in its training data.

Now let's tweak it to something that requires a small step of real-world inference:

I'm in a room with my mother, my mother's brother, my brother's son, and my son's sister. Everyone is wearing a hat. I ask everyone to take their hat off. How many hats are removed?

Many models get this wrong. They might count the descriptions and say "four hats". The correct answer is three, because "I" am my son's sister's father. The model doesn't have a concept of "self" or family relationships beyond statistical text associations. It treats each described person as a unique entity because it's not building a mental model of the scene. It's just matching words.

A pair of football goalposts barely visible in the distance on a very foggy field.

This is the core difference. Fluency is not intelligence. A parrot can mimic human speech perfectly, but it doesn't understand the conversation. LLMs are just very, very large parrots.

The Moving Goalposts of Intelligence

Part of the problem is that our definition of "intelligence" is a slippery target. For decades, we held up games as the ultimate benchmark.

First, it was chess. We said if a machine could beat a grandmaster, that would be a sign of true intelligence. Then Deep Blue beat Garry Kasparov in 1997. We learned it wasn't intelligence, but brute-force calculation of millions of moves. The goalposts moved.

Then, it was Go, a game with more possible moves than atoms in the universe, making brute force impossible. We said that was the true test. Then AlphaGo beat Lee Sedol in 2016 using deep neural networks. We realized it was advanced pattern recognition, not general reasoning. The goalposts moved again.

Now the goalpost is the Turing Test—the idea that a machine is intelligent if it can fool a human into thinking it's also human. Today's chatbots can pass a casual Turing Test with ease. But does that mean they think? Or does it just mean they've gotten extremely good at imitating the patterns of human conversation they were trained on?

Chasing AGI is a distraction. It anthropomorphizes a tool and leads us down philosophical rabbit holes about consciousness that have no bearing on how we use this technology today.

A Better Frame: From Intelligence to Capability

Instead of asking if a model is "intelligent," we should be asking what it's capable of.

This is a much more practical and useful way to think. It frames the AI as a tool, not a creature. You don't ask if your hammer is intelligent; you ask if it can drive a nail. We should treat LLMs the same way.

Here are some concrete capabilities of today's best models:

Code Scaffolding: Generating boilerplate for an API endpoint, writing unit tests, or converting a data structure from Python to JavaScript.
Data Structuring: Extracting names, dates, and summaries from a wall of unstructured text and formatting it as CSV, JSON, or Markdown.
Summarization and Synthesis: Condensing a 10,000-word article or a long email thread into bullet points.
Brainstorming: Generating 50 blog post titles or a list of potential names for a new product.

A clean, well-organized toolbox where the tools are glowing digital icons and software logos.

And here are their current, very real limitations:

Factual Accuracy: They confidently hallucinate facts, citations, and statistics. Never trust them for anything that needs to be correct without verification.
Causal Reasoning: They can't reliably determine cause and effect. They find correlations, not causation.
Long-Term Planning: They can't hold a complex, multi-step goal in mind and execute on it without significant human guidance and correction at each step.
Common Sense: They have no grounding in the physical world. They don't know that you can't push a rope or that a glass will break if you drop it, unless it's explicitly stated in their training data.

Stop worrying about whether AI is going to become your conscious overlord. Start getting really good at using your incredibly powerful, but very dumb, new hammer. The real skill in the next decade isn't philosophy, it's writing prompts that get the machine to do useful work.