Claude 3.5 suggests AI’s looming ubiquity could be a good thing

The frontier of AI just got pushed a little further forward. On Friday, Anthropic, the AI lab set up by a team of disgruntled OpenAI staffers, released the latest version of its Claude LLM. From Bloomberg:

The company said Thursday that the new model – the technology that underpins its popular chatbot Claude – is twice as fast as its most powerful previous version. Anthropic said in its evaluations, the model outperforms leading competitors like OpenAI on several key intelligence capabilities, such as coding and text-based reasoning.

Anthropic only released the previous version of Claude, 3.0, in March. This latest model has been called 3.5, and currently only exists in the company’s mid-sized “Sonnet” iteration. Its faster, cheaper, and dumber “Haiku” version will arrive shortly, it says – as will its slower, expensive, but most capable “Opus”.

But even before Opus arrives, Anthropic says that it’s got the best AI on the market. In a series of head-to-head comparisons posted on its blog, 3.5 Sonnet outperformed OpenAI’s latest model, GPT-4o, on tasks including maths quizzes, text comprehension, and undergraduate knowledge. It wasn’t a clean sweep, with GPT retaining the lead in some benchmarks, but it was enough to justify the company’s claim to be at the frontier of what is possible.

In more qualitative terms, the AI seems like a step forward too. Anthropic says:

It shows marked improvement in grasping nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone.

They’re marking their own homework, but the description matches the changes I’ve noticed. Wherever it falls on the technical benchmarks, a conversation with the latest version of Claude feels more pleasant than any other AI system I’ve used so far.

The company isn’t simply selling the update on power, though. Instead, in a move favoured by underdog competitors everywhere, Anthropic is focusing as much on cost as on capability. Claude 3.5 isn’t just smarter than the old state of the art, the company says – it’s also cheaper.

For consumers, the chatbot market is shaking out as a “freemium” model: for free, you can access a (sometimes second-tier) chatbot for a limited amount of time, while a monthly subscription nets you the best models and higher or unlimited use. For businesses, though, there’s a stricter pricing structure based on both questions and answers, and Anthropic has undercut OpenAI on the cost of inputs, and matched it on outputs. It’s also five times cheaper than its own previous best.

If you don’t like seeing AI chatbots pop up in more and more places, then that’s possibly bad news for you. It’s getting cheaper and cheaper to build your own business on top of a company like Anthropic, and more firms will do so as prices fall. The good news is, each update also improves the capability of those businesses.

The last year of AI progress has been odd, in hindsight. After the leap in capabilities brought on by GPT-4 last spring, the frontier has moved on in fits and starts: Claude 3 and 3.5, and GPT-4o, all represented definite improvements, but none the great leap that the AI community has been implying is shortly to come.

At the same time, the presence of any improvement at all should be heartening. The fact that meaningful changes can be made beyond simply throwing insane money at whole new training runs suggests that some of the mystery about how these systems actually work is being cleared up, and AI development is turning from an art into a science. That, in turn, should mean that the products of the massive training runs – which are assuredly happening – can be hammered into useful and safe tools sooner rather than later.

Safety, made in Britain

Rishi Sunak speaks at the second day of the UK Artificial Intelligence (AI) Safety Summit at Bletchley Park in November. Photograph: Toby Melville/AP

There is a coda to the Claude 3.5 release: it’s been vetted for safety by the UK government. Anthropic says:

As part of our commitment to safety and transparency, we’ve engaged with external experts to test and refine the safety mechanisms within this latest model. We recently provided Claude 3.5 Sonnet to the UK’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety evaluation. The UK AISI completed tests of 3.5 Sonnet and shared their results with the US AI Safety Institute (US AISI) as part of a Memorandum of Understanding, made possible by the partnership between the US and UK AISIs announced earlier this year.

As with the Bletchley and Seoul AI summits, the UK government has managed to turn what could have been a technophilic quirk of Rishi Sunak’s into something apparently lasting and successful. The fact that the public sector AI safety institute is so world leading that the US government is outsourcing its own work to us is genuinely something to be proud of.

The next question, of course, is what good can come of it. It’s easy to get hold of an AI model to test if the company involved thinks it’s going to pass with flying colours; the question will be if AISI can change the AI labs, rather than merely prod them and see what happens.

skip past newsletter promotion

EU can’t fire us – we quit

Margrethe Vestager gives a press conference on the EU’s antitrust case with Apple App Store in Brussels, Belgium, 4 March 2024. Photograph: Olivier Hoslet/EPA

Apple’s war with the EU is getting hotter. On Friday, the company confirmed it wouldn’t be shipping a raft of new features to users in the EU, citing “regulatory uncertainties brought about by the Digital Markets Act (DMA)”. From its statement:

We do not believe that we will be able to roll out three of these features – iPhone Mirroring, SharePlay Screen Sharing enhancements, and Apple Intelligence – to our EU users this year.

Specifically, we are concerned that the interoperability requirements of the DMA could force us to compromise the integrity of our products in ways that risk user privacy and data security. We are committed to collaborating with the European Commission in an attempt to find a solution that would enable us to deliver these features to our EU customers without compromising their safety.

It’s a Rorschach test of a statement. If you think the EU’s regulation is overbearing, protectionist and incoherent, then Apple is taking the only sensible action, limiting its product launches to the most uncontroversial features in order to avoid a potential multibillion euro fine.

If, on the other hand, you think that Apple’s response to the EU has been one of malicious compliance and outrage at the thought of an authority more legitimate than its own, then this is just another attempt at discouraging governments from following in the bloc’s footsteps.

The EU, it seems, is not deterred. On Monday, it announced plans to sue over Apple’s noncompliance:

In preliminary findings, against which Apple can appeal, the European Commission said it believed its rules of engagement did not comply with the Digital Markets Act (DMA) “as they prevent app developers from freely steering consumers to alternatives channels for offers and content”.

In addition, the commission has opened a new non-compliance procedure against Apple over concerns its new contract terms for third-party app developers also fall short of the DMA’s requirements.

For the EU, the principle is clear: if a European customer wants to do business with a European business, it should not be in the power of a third country, company, or person to prevent that market from operating. It’s as close to the founding ideal of the bloc as one can get, really.

But it’s also not exactly what the DMA says. Hence the conflict. Apple wants to follow the letter of the law while retaining as much control over its platforms as possible; the EU wants to interpret that same law to give as much freedom for smooth commerce as it can. I don’t know which interpretation will win this time, but I’m confident in my prediction that the appeals have only just begun.

Source link
lol

Claude 3.5 suggests AI’s looming ubiquity could be a good thing

Safety, made in Britain

EU can’t fire us – we quit

By stp2y

Leave a Reply Cancel reply