AI Will Understand Humans Better Than Humans Do

LLM Guardrails: Secure and Controllable Deployment


Michal Kosinski is a Stanford research psychologist with a nose for timely subjects. He sees his work as not only advancing knowledge, but alerting the world to potential dangers ignited by the consequences of computer systems. His best-known projects involved analyzing the ways in which Facebook (now Meta) gained a shockingly deep understanding of its users from all the times they clicked “like” on the platform. Now he’s shifted to the study of surprising things that AI can do. He’s conducted experiments, for example, that indicate that computers could predict a person’s sexuality by analyzing a digital photo of their face.

I’ve gotten to know Kosinski through my writing about Meta, and I reconnected with him to discuss his latest paper, published this week in the peer-reviewed Proceedings of the National Academy of Sciences. His conclusion is startling. Large language models like OpenAI’s, he claims, have crossed a border and are using techniques analogous to actual thought, once considered solely the realm of flesh-and-blood people (or at least mammals). Specifically, he tested OpenAI’s GPT-3.5 and GPT-4 to see if they had mastered what is known as “theory of mind.” This is the ability of humans, developed in the childhood years, to understand the thought processes of other humans. It’s an important skill. If a computer system can’t correctly interpret what people think, its world understanding will be impoverished and it will get lots of things wrong. If models do have theory of mind, they are one step closer to matching and exceeding human capabilities. Kosinski put LLMs to the test and now says his experiments show that in GPT-4 in particular, a theory of mind-like ability “may have emerged as an unintended by-product of LLMs’ improving language skills … They signify the advent of more powerful and socially skilled AI.”

Kosinski sees his work in AI as a natural outgrowth of his earlier dive into Facebook Likes. “I was not really studying social networks, I was studying humans,” he says. When OpenAI and Google started building their latest generative AI models, he says, they thought they were training them to primarily handle language. “But they actually trained a human mind model, because you cannot predict what word I’m going to say next without modeling my mind.”

Kosinski is careful not to claim that LLMs have utterly mastered theory of mind—yet. In his experiments he presented a few classic problems to the chatbots, some of which they handled very well. But even the most sophisticated model, GPT-4, failed a quarter of the time. The successes, he writes, put GPT-4 on a level with 6-year-old children. Not bad, given the early state of the field. “Observing AI’s rapid progress, many wonder whether and when AI could achieve ToM or consciousness,” he writes. Putting aside that radioactive c-word, that’s a lot to chew on.

“If theory of mind emerged spontaneously in those models, it also suggests that other abilities can emerge next,” he tells me. “They can be better at educating, influencing, and manipulating us thanks to those abilities.” He’s concerned that we’re not really prepared for LLMs that understand the way humans think. Especially if they get to the point where they understand humans better than humans do.

“We humans do not simulate personality—we have personality,” he says. “So I’m kind of stuck with my personality. These things model personality. There’s an advantage in that they can have any personality they want at any point of time.” When I mention to Kosinski that it sounds like he’s describing a sociopath, he lights up. “I use that in my talks!” he says. “A sociopath can put on a mask—they’re not really sad, but they can play a sad person.” This chameleon-like power could make AI a superior scammer. With zero remorse.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.