Is AGI Here? A Sober Look at GPT-4o and Beyond

Everyone is asking if the latest models like GPT-4o are 'sparks of AGI.' The short answer is no. The long answer is about what these tools actually are and where they fall apart completely.

June 9, 2026 · 3 min read · SuperThinking team

A futuristic robot tilts its head in confusion while examining a simple kitchen toaster.

Is AGI here? No. But it's suddenly a serious question, and that's new.

For years, Artificial General Intelligence was pure sci-fi. Now, you see a demo of GPT-4o having a real-time, translated conversation, and you can't help but wonder. It feels different. It feels... general.

But it's not. What we have are incredibly powerful, versatile tools that are brilliant pattern-matchers. They are not thinking machines. Confusing the two is a fast way to build unreliable systems and misunderstand what you're actually working with.

The 'Sparks of AGI' Argument

The reason we're even having this conversation is because today's frontier models are genuinely stunning. They aren't just text-in, text-out anymore. They're multi-modal, meaning they can process and generate text, images, audio, and even video in real-time.

Look at a model like GPT-4o. You can point your phone's camera at a math problem on a piece of paper, and it will walk you through solving it, verbally. You can show it your messy room, and it can suggest organizational strategies. It can act as a real-time translator between two people speaking different languages.

This feels like general intelligence because it crosses domains. It's not a chess-playing AI or a protein-folding AI. It does many things, just like a human.

This versatility is where the 'sparks of AGI' idea comes from. When a single system can summarize a report, write Python code to analyze the data from it, and then create a presentation slide visualizing the result, it's easy to see why people get excited. It's an incredible force multiplier for knowledge work.

But capability isn't the same as cognition.

A flowchart or system diagram with several connected nodes and one obvious empty space.

Where It All Falls Apart

For all their power, these models have fundamental, architectural blind spots. They fail at things a five-year-old finds trivial, because they don't have a world model. They don't understand reality; they model statistical relationships in data.

Here are the big gaps:

Physical Common Sense: An LLM can tell you the boiling point of water, but it has no intuitive grasp of what 'wet' means. It can't look at a table and know not to push a glass off the edge. It lacks a grounded understanding of physics, space, and object permanence. This is why robotics is still so hard.

Long-Term Agency: You can't give a model a goal like "get my startup to product-market fit" and have it work autonomously for six months. It operates on a short context window. It responds to prompts. It doesn't have desires, drives, or the ability to formulate and execute complex, long-range plans without constant human intervention.

Causal Reasoning: Models are great at correlation, terrible at causation. They know that 'smoke' and 'fire' appear together in text, but they don't truly understand that one causes the other. Ask it a novel reasoning puzzle that isn't in its training data, and it often defaults to a plausible-sounding but logically flawed answer. It's just predicting the next most likely word.

Embodiment: The human brain is not a disembodied text predictor. Our intelligence is shaped by our physical bodies, our senses, and our interaction with the environment. Models lack this. They have never felt the sun, lifted a heavy object, or tasted salt. This lack of embodied experience creates an unbridgeable gap in understanding the world as we do.

This isn't just nitpicking. These failures are why you can't trust an LLM to manage your power grid or perform surgery. The stakes are too high for a system that hallucinates facts and doesn't understand cause and effect.

A human designer and a robotic arm collaborate on a workbench, building something together.

So, What Are We Actually Building?

If it's not AGI, what is it? A better framing is that we're building the world's first universal cognitive prosthetics.

A calculator is a prosthetic for arithmetic. A compiler is a prosthetic for turning human logic into machine code. An LLM is a prosthetic for a huge range of cognitive tasks: summarizing, translating, brainstorming, coding, and more.

It's a tool that augments human intelligence, not a replacement for it. Your job isn't to ask it for the 'right answer.' Your job is to use it as an infinitely patient, slightly unreliable intern. You give it the first draft, you check its work, and you provide the critical thinking and real-world judgment it lacks.

The real revolution isn't the imminent arrival of a new silicon consciousness. It's the distribution of powerful cognitive tools that let us all think faster, write better, and build more complex things than ever before.

Forget waiting for AGI. The interesting stuff is happening right now, with the tools we already have.