Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More
A new large language model (LLM) has apparently taken the performance crown from OpenAI’s GPT-4o about a month after its release: the new Claude 3.5 Sonnet chatbot and LLM from rival AI firm Anthropic, released today, bests all others in the world on key third-party benchmark tests, according to the company. And it does so while being faster and cheaper than prior Claude 3 models.
But it’s one thing to drop a new model and claim dominance, and yet another for users to truly experience and leverage the performance gains (Google Gemini family — I’m looking at you: supposedly better than OpenAI’s prior flagship GPT-4 on some metrics, but who is really using you?).
Anthropic’s latest release of Claude 3.5 Sonnet doesn’t seem to have this problem. Many AI influencers and power users have taken to the web in the few hours since its release to share their largely positive impressions about Anthropic’s new model, and show off what the new, “most intelligent” LLM in the world is able to accomplish.
Advancing coding skills and product creation
As enterprise AI influencer and expert Allie K. Miller wrote on X, Claude 3.5 Sonnet was able to create an entire playable game for her based on just a screenshot, in less than half a minute:
Countdown to VB Transform 2024
Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now
Similarly, the informative and timely X account @TestingCatalog News showed how the newly launched “Artifacts” playground — which debuted alongside Claude 3.5 Sonnet, quite literally, showing a view of interactive outputs beside the chatbot interface — can execute code for real, working web form that Claude 3.5 Sonnet built.
It even was able to recreate imagery from the seminal 1995 movie Hackers:
Pietro Schirano, founder of enterprise AI image generation startup EverArt, wrote on X that combining Claude 3.5 Sonnet with another tool, Maestro, showed “sparks of AGI?”
Anthropic staffers go to bat for Claude 3.5 Sonnet
Though obviously biased, Anthropic developer relations team leader Alex Albert posted a thread on X highlighting how Claude 3.5 Sonnet is “starting to get really good at coding and autonomously fixing pull requests” and even went so far as to state: “It’s becoming clear that in a year’s time, a large percentage of code will be written by LLMs.”
Similarly, Anthropic technical staffer Maggie Vo posted on X that Claude 3.5 Sonnet can now do “half my job…and I couldn’t be happier.”
Putting pressure on OpenAI
Others observed that now that Claude 3.5 Sonnet has eclipsed GPT-4o from OpenAI and is available at similar pricing, the latter company is under renewed pressure to continue making the case for its models as the right choice.
Pennsylvania University Wharton School of Business professor and AI booster Ethan Mollick compared the Artifacts feature to a “simpler version of Code Interpreter” from OpenAI’s GPT-4.
X user @kimmonismus went even further, saying OpenAI will “sleep through AGI” or artificial general intelligence, the company’s stated goal of an AI model that outperforms humans in most economically valuable work. They blasted the company for announcing additional features with GPT-4o that have yet to ship, including new voice modalities.
Still not human level
Despite the lofty praise around X, others noted that Claude 3.5 Sonnett still struggled with some of the seemingly basic cognitive tasks that humans can perform with relative ease, such as playing “tic tac toe.”
Similarly, tech journalist Timothy B. Lee, known from his handle @binarybits on X, noted that it “still makes goofy errors sometimes,” posting a screenshot asking it for the answer to a simple math word problem: which is worth more: 100 pennies or three quarters? to which it answered Three quarters, initially.
Still, even with these so-far minor issues, Claude 3.5 Sonnet appears to be a tremendous leap for Anthropic and LLMs generally, and shows that the performance gains of individual AI model makers are certainly not slowing down with current levels of available compute resources (i.e. GPUs).
Source link lol