Elon Musk announces ‘most powerful’ AI training cluster in the world

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Elon Musk is in charge of at least six innovative companies: Tesla, SpaceX, Starlink, X (formerly Twitter), Neuralink and xAI. But it’s not enough for him.

The multi-company leader announced today on X that xAI — which offers large language models (LLMs) known as Grok and the chatbot of the same name through X to paid subscribers — has begun training on the “most powerful AI training cluster in the world,” the so-called Memphis Supercluster in Memphis, Tennessee.

Nice work by @xAI team, @X team, @Nvidia & supporting companies getting Memphis Supercluster training started at ~4:20am local time.
With 100k liquid-cooled H100s on a single RDMA fabric, it’s the most powerful AI training cluster in the world!
— Elon Musk (@elonmusk) July 22, 2024

According to local news outlet WREG, the supercluster is located in the southwestern part of the city and “will be the largest, capital investment by a new-to-market company in the city’s history.” Yet xAI does not yet have a contract in place with local utility Tennessee Valley Authority, which requires such to provide electricity to projects in excess of 100 megawatts.

Chock full of Nvidia H100s

Notwithstanding, Musk further detailed that the cluster consists of 100,000 liquid-cooled H100 graphics processing units (GPUs), the chips offered by Nvidia starting last year that are in high demand by AI model providers, including Musk’s rivals (and former friends) at OpenAI.

Musk also noted that the cluster is operating on a single RDMA fabric, or Remote Direct Memory Access fabric, which Cisco helpfully notes as a way to provide more efficient and lower latency data transfer between compute nodes without burdening the central processing unit (CPU).

xAI aims to offer the ‘most powerful AI by every metric’ as of Dec. 2024

Obviously, xAI aims to train its own LLMs on the supercluster. But more than that, Musk posted in a reply that the company is aiming to train “the world’s most powerful AI by every metric” and to do so “by December this year.”

He also posted that the Memphis Supercluster would provide a “significant advantage” to this end.

This is a significant advantage in training the world’s most powerful AI by every metric by December this year
— Elon Musk (@elonmusk) July 22, 2024

Not holding my breath on the timing

For all of his many ambitions and successes, Musk is notorious for publicly putting forth and then missing deadlines on many projects such as full self-driving automobiles, robotaxis, and taking people to Mars, so I won’t be holding my breath for the December 2024 reveal of the new Grok LLM. But it would be a surprising and a big boost to xAI’s efforts if it did come out in that time frame.

Especially with OpenAI, Anthropic, Google, Microsoft and Meta all pursuing more powerful and affordable LLMs and SLMs, xAI will need a new and useful model if it aims to remain competitive in the ongoing gen AI race for customers, users, and attention.

Indeed, OpenAI backer Microsoft is itself reportedly working with OpenAI CEO Sam Altman on a $100 billion AI training supercomputer codenamed Stargate, according to The Information. Depending on how that develops, xAI’s Memphis Supercluster may not be the most powerful in the world for long.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link lol

Elon Musk announces ‘most powerful’ AI training cluster in the world

Chock full of Nvidia H100s

xAI aims to offer the ‘most powerful AI by every metric’ as of Dec. 2024

Not holding my breath on the timing

By stp2y

Leave a Reply Cancel reply