Recent AI models are surprisingly humanlike in their ability to generate text, audio, and video when prompted. However, so far these algorithms have largely remained relegated to the digital world, rather than the physical, three-dimensional world we live in. In fact, whenever we attempt to apply these models to the real world even the most sophisticated struggle to perform adequately—just think, for instance, of how challenging it has been to develop safe and reliable self-driving cars. While artificially intelligent, not only do these models simply have no grasp of physics but they also often hallucinate, which leads them to make inexplicable mistakes.
This is the year, however, when AI will finally make the leap from the digital world to the real world we inhabit. Expanding AI beyond its digital boundary demands reworking how machines think, fusing the digital intelligence of AI with the mechanical prowess of robotics. This is what I call “physical intelligence”, a new form of intelligent machine that can understand dynamic environments, cope with unpredictability, and make decisions in real time. Unlike the models used by standard AI, physical intelligence is rooted in physics; in understanding the fundamental principles of the real world, such as cause-and-effect.
Such features allow physical intelligence models to interact and adapt to different environments. In my research group at MIT, we are developing models of physical intelligence which we call liquid networks. In one experiment, for instance, we trained two drones—one operated by a standard AI model and another by a liquid network—to locate objects in a forest during the summer, using data captured by human pilots. While both drones performed equally well when tasked to do exactly what they had been trained to do, when they were asked to locate objects in different circumstances—during the winter or in an urban setting—only the liquid network drone successfully completed its task. This experiment showed us that, unlike traditional AI systems that stop evolving after their initial training phase, liquid networks continue to learn and adapt from experience, just like humans do.
Physical intelligence is also able to interpret and physically execute complex commands derived from text or images, bridging the gap between digital instructions and real-world execution. For example, in my lab, we’ve developed a physically intelligent system that, in less than a minute, can iteratively design and then 3D-print small robots based on prompts like “robot that can walk forward” or “robot that can grip objects”.
Other labs are also making significant breakthroughs. For example, robotics startup Covariant, founded by UC-Berkeley researcher Pieter Abbeel, is developing chatbots—akin to ChatGTP—that can control robotic arms when prompted. They have already secured over $222 million to develop and deploy sorting robots in warehouses globally. A team at Carnegie Mellon University has also recently demonstrated that a robot with just one camera and imprecise actuation can perform dynamic and complex parkour movements—including jumping onto obstacles twice its height and across gaps twice its length—using a single neural network trained via reinforcement learning.
If 2023 was the year of text-to-image and 2024 was text-to-video, then 2025 will mark the era of physical intelligence, with a new generation of devices—not only robots, but also anything from power grids to smart homes—that can interpret what we’re telling them and execute tasks in the real world.
Source link
lol