OpenAI’s Most Advanced AI Release Stumped by New York Times Word Game

Connections claims another victim.

Awful General Intelligence

While OpenAI CEO Sam Altman claims that the company already has the building blocks for artificial general intelligence, a simple test of its most advanced publicly-avilable AI system was caught majorly lacking by a puzzle that countless people do every day.

As Walter Bradley Center for Natural and Artificial Intelligence senior fellow Gary Smith writes for Mind Matters, OpenAI’s o1 “reasoning” model failed spectacularly when tasked with solving the New York Times‘ notoriously tricky Connections word game.

The game’s rules are deceptively simple. Players are given 16 terms and tasked with figuring out what they have in common, within groups of four — but because the things relating them can be as obvious as “book subtitles” or as esoteric as “words that start with fire,” it can be quite challenging.

As Smith explained, he had o1 and other large language models (LLMs) from Google, Anthropic, and Microsoft (which is powered by OpenAI’s tech) try to solve the Connections puzzle of the day.

Surprisingly — if you buy into AI hype, at least — they all failed. That’s especially true of o1, which has been immensely hyped as the company’s next-level system, but which apparently can’t reason its way through an NYT word game.

Connect Four

When he fed that day’s Connections challenge into the model, o1 did, to its credit, get some of the groupings right. But its other “purported combinations verge[d] on the bizarre,” Smith found.

In one instance, o1 grouped the words “boot,” “umbrella,” “blanket,” and “pant” and said the relating theme was “clothing or accessories.” Three out of four ain’t bad, of course, but who’s wearing a blanket, except as some sort of out-there fashion statement?

After doing the entire exercise over with the same set of words, the LLM confidently said that “breeze,” “puff,” “broad,” and “picnic” were “types of movement or air.” Points for the first two, but we’re as puzzled as Smith on the latter ones.

Overall, Smith rightfully assessed o1 as proffering “many puzzling groupings” alongside its “few valid connections.” It’s also a telling demonstration of some familiar AI shortfalls: that it can often impress when regurgitating information that’s already well documented in its training data, but frequently struggles with novel queries.

Our semi-professional take: if OpenAI is indeed reaching the precipice of AGI — or has already achieved the start of it, as one of its employees claimed at the end of last year — the company is clearly keeping it behind wraps, because this simply ain’t it.

More on OpenAI: Mother of OpenAI Whistleblower Alleges He Was Murdered, Says There Were Signs of Struggle

Source link
lol