Elon Musk says all human data for AI training ‘exhausted’

Artificial intelligence companies have run out of data for training their models and have “exhausted” the sum of human knowledge, Elon Musk has said.

The world’s richest person suggested technology firms would have to turn to “synthetic” data – or material created by AI models – to build and fine-tune new systems, a process already taking place with the fast-developing technology.

“The cumulative sum of human knowledge has been exhausted in AI training. That happened basically last year,” said Musk in an interview livestreamed on his social media platform, X.

AI models such as the GPT-4o model powering the ChatGPT chatbot are “trained” on a vast array of data taken from the internet, where they in effect learn to spot patterns in that information – allowing them to predict, for instance, the next word in a sentence.

Musk said the “only way” to counter the lack of source material for training new models was to move to synthetic data created by AI.

Referring to the exhaustion of data troves, he said: “The only way to then supplement that is with synthetic data where … it will sort of write an essay or come up with a thesis and then will grade itself and … go through this process of self-learning.”

Meta, the owner of Facebook and Instagram, has used synthetic data to fine-tune its biggest Llama AI model, while Microsoft has also used AI-made content for its Phi-4 model. Google and OpenAI, the company behind ChatGPT, have also used synthetic data in their AI work.

However, Musk also warned that AI models’ habit of generating “hallucinations” – a term for inaccurate or nonsensical output – was a danger for the synthetic data process.

He told the livestreamed interview with Mark Penn, the chair of the advertising group Stagwell, that hallucinations had made the process of using artificial material “challenging” because “how do you know if it … hallucinated the answer or it’s a real answer”.

High-quality data, and control over it, is one of the legal battlegrounds in the AI boom. OpenAI admitted last year it would be impossible to create tools such as ChatGPT without access to copyrighted material, while the creative industries and publishers are demanding compensation for use of their output in the model training process.

Source link
lol

Elon Musk says all human data for AI training ‘exhausted’

By stp2y

Leave a Reply Cancel reply