Is AGI Here? A Hard Look at What AI Can't Do Yet

Everyone's debating if the latest models are AGI. Here's a practical breakdown of what they can do, where they fail spectacularly, and why that matters for you building real things.

May 6, 2026 · 4 min read · SuperThinking team

A slightly confused-looking humanoid robot examines its own metallic hand.

No, AGI is not here. It's not even close.

Every time a new flagship model drops, the discourse cycle repeats. Someone posts a cherry-picked example of complex reasoning, the tweet goes viral, and for two weeks everyone debates whether we've accidentally created a conscious, self-aware intelligence in a GPU cluster.

We haven't. What we have are incredibly sophisticated pattern-matching engines. They are phenomenal tools that can accelerate work in surprising ways. But they are not general intelligences. Confusing the two leads to bad products and misplaced fears.

What Models Can Do (And Why It Looks Like Magic)

Let's be clear: today's best models are legitimately amazing. If you'd shown me GPT-4 or Claude 3 Opus five years ago, I would have thought it was science fiction. Their core strength is fluency and synthesis. They can ingest vast amounts of text and code, identify the patterns, and generate new text that fits those patterns.

This is why they excel at tasks like:

Summarization: Condensing a 5,000-word article into 200 words is a pattern-matching task. The model identifies the key concepts (statistical keywords and phrases) and reassembles them into a shorter form.
Code Generation: Writing boilerplate for a React component or a Python script is also pattern matching. The model has seen millions of examples on GitHub and knows what a function signature in Flask usually looks like.
Translation: Converting English to Spanish is mapping patterns in one language to their equivalent patterns in another.
Style Mimicry: Rewriting an email to be more formal is about applying a stylistic pattern to existing content.

When you ask a model to write a Python script to hit a weather API and display the temperature, it's not reasoning about meteorology or HTTP requests. It's retrieving and stitching together the most probable sequence of tokens based on the thousands of API-client scripts it was trained on. It feels like magic, but it's just very, very good autocomplete.

The Cracks in the Facade

The illusion of intelligence shatters when you move beyond tasks that can be solved with pattern matching and into areas that require genuine reasoning or a model of the world. This is where you see the systems fail in simple, almost comical ways.

One classic weak spot is physical reasoning. Ask a model a simple riddle: "I have a box that holds 12 eggs. I put 9 eggs in it, then I trip and drop the box. Three eggs break. How many whole eggs are left?" Many models will confidently answer "6." They see 9 - 3 and perform the math, completely missing the physical context that the dropped eggs are the ones that broke. They don't have a mental model of an egg, a box, or gravity.

A close-up view of a single cracked egg with the yolk spilling out.

Causal reasoning is another major failure point. Models can identify correlations in data but struggle to understand cause and effect. If training data shows that ice cream sales and shark attacks both increase in the summer, the model might infer a relationship. It takes a real-world understanding to know the confounding variable is the summer heat driving people to both the beach and the ice cream truck. The model doesn't know things; it only knows statistical relationships between words.

We also see a total lack of persistent agency. You can't give an LLM a goal like "find me the best flight to Tokyo for next month" and have it work on the problem for a week, checking prices, monitoring for deals, and updating you. It's a one-shot tool. You give it a prompt, it gives a response. It has no memory of past interactions (beyond the context window), no ability to act autonomously, and no capacity for long-term planning.

They also can't reliably know what they don't know. A human expert, when faced with a novel problem, will say, "I need more information" or "That's outside my area of expertise." An LLM will just start making things up, a process we politely call 'hallucination.' It will invent API endpoints, cite non-existent legal cases, and generate code that looks plausible but is fundamentally broken. It's designed to generate the next most likely token, not to be correct.

So, What Are We Actually Building?

If these models aren't AGI, what are they? The most useful framing is to think of them as universal tools or 'reasoning engines'. They are incredibly powerful, flexible systems for manipulating text, code, and ideas. They are a new kind of compiler—one that works on natural language instructions.

This is not a lesser achievement. It's a massive one. We've created a tool that can be shaped to perform an almost infinite variety of knowledge-work tasks. It's an amplifier for human intelligence, not a replacement for it.

The trick is to use it for what it's good at. Don't ask it to be your autonomous CEO. Do ask it to refactor your legacy codebase, draft three versions of a marketing email, or summarize a dense research paper so you can get the gist in 60 seconds.

A person carefully operating a complicated piece of machinery with many dials.

Building with these models means architecting systems that play to their strengths (fluency, speed, pattern matching) while mitigating their weaknesses (lack of real-world grounding, hallucinations, poor reasoning). This often means putting a human in the loop or building validation and verification steps around the model's output.

For example, instead of an autonomous coding agent, build a system where the LLM suggests five different implementations and a human developer makes the final choice. Instead of an AI doctor, build a tool that summarizes patient notes and highlights potential areas for the human doctor to investigate.

Stop chasing the ghost of AGI. The real opportunity is in building practical, reliable tools with the powerful, flawed, and fundamentally non-sentient models we have today.