Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
A year ago today, Sam Altman returned to OpenAI after being fired just five days earlier. What really happened in the boardroom? Fable, a game and AI simulation company, built its AI Sim Francisco “war game” to find out why the behind closed doors board fight turned out the way it did.
It feels a bit weird to simulate a real-life event in this way, but Fable CEO Edward Saatchi is interested in whether a different set of decisions could have led to a different outcome for this company at the center of the generative AI revolution.
The simulation pits different board members and personalities against each other in a “multi-agent competition,” where each AI player is trying to come out on top. Here’s the war game research paper being released today that came from this experiment.
The SIM-1 framework for AI decision making is basically a simulation of the five days from when Sam Altman was removed as CEO of OpenAI to when he returned.
“Simulations offer a completely new way to explore AI decision making in rich environments — including in war game situations where predicting possible outcomes can be invaluable,” said Joshua Johnson, CEO of Tree, an AI startup which partnered with Fable on this research paper, said in a statement. “These aren’t simply chatbots. These AIs need to sleep and eat, and to balance many different physical, mental and emotional goals.”
SIM-1, in part using the new reasoning model GPT4o, gives its sense of what happened behind closed doors at OpenAI between Sam and Ilya, the hidden tactics of leading players such as Satya Nadella and Marc Andreessen, and what was said by the leading players as they grappled with an unprecedented crisis in the tech industry.
“It’s interesting to find out just how unlikely it was that Sam did return,” Saatchi said in an interview with GamesBeat. “That’s why people run war games in D.C. and beyond. How likely was it that a particular event happened? Then you can base decisions around that. This scenario showed that 16 out of 20 times, Sam did not return.”
Across 20 simulations, Sam Altman’s AI returned as CEO four times — showing just how unlikely this outcome was. In other outcomes, Mira Murati, the acting CEO remained CEO and in one, SIM-1 chose Elon Musk, Altman’s rival, to become the new CEO.
“Today, AI agents are defined by their personality. We wanted to show agents operating on decision making in a complex simulation,” said Saatchi, in a statement. “In the five days from November 17 to November 21, the world watched some of its most intelligent people — people like Satya Nadella, Sam Altman and Ilya Sutskever – forced to operate in a rapid Game of Thrones, high pressure, short timeframe scenario, where they had to use game theory and deception to come out on top. We felt this was a perfect scenario to test out SIM-1, GPT4o and Sim Francisco.”
For us, Sim Francisco has actual power and intelligence around a struggle and factions. It gives us the ability to start to think about season-long arcs of stories that come out of San Francisco, instead of just little, tiny vignettes, which is what we showed last year. It gives us the ability to kind of tell richer, more complex stories in San Francisco, or have the AI tell them for us. There are strong factional objectives so that you could plausibly start to make a Game of Thrones story.”
Fable has won a couple of Primetime Emmy Awards and it has gone through a rich history of experimental inventions with virtual reality, gaming and AI technologies. It built SIM-1 in an attempt to solve the mystery of what happened in the OpenAI boardroom fight.
How it works
Each of the 20 simulations starts with the announcement that Sam Altman has been removed as CEO. Across four turns a day, each agent has the ability to cajole, charm and manipulate their way into the top position — replacing Sam as CEO, funding his new venture, or hiring the staff of OpenAI away.
The different AI agents can choose a strategy, like deception, to try to pull ahead of the others and become anointed the new CEO.
“AI characters today are ‘nice but dull.’ We wanted to show agents that were aggressive, intelligent, able to manipulate and deceive but also confused about their own decisions and goals — like real people AI characters must be complex and contain what Jung has called ‘The Shadow,’” Saatchi said. “The five days from when Sam Altman was removed and returned to OpenAI were game theory at lightspeed.”
He said it was like watching a season of Game of Thrones play out in five days. The world watched as highly intelligent players vied to become the most powerful person in Silicon Valley, whether by hiring the entire staff of OpenAI, becoming the new CEO of OpenAI or funding Sam and Greg in a new venture for a chance at outsize investment returns.
“It was Game of Thrones in real life, and using AI to find out both what happened behind closed doors and to project different outcomes was an amazing challenge,” Saatchi said.
In the Simulation of Sim Francisco, over the five days, agents representing tech luminaries like Sam Altman, Satya Nadella and Ilya Sutskever each have 4 turns a day, including one for sleep, and can react to each other’s behavior. An adjudicator agent — similar to a dungeon keeper — decides which agent wins each round, as well as the overall winner.
In the 20 simulations attempted, the Sam Altman agent returned just four times – the most but still only 20% of the time showing just how unlikely his return was. Across different simulations agents used different techniques to win including alliance building, direct confrontation and more passive pure information gathering. In some cases agents only gathered information and avoided taking any aggressive actions. In one case Mira Murati became the permanent CEO while allowing other agents to aggressively undermine each other.
Different agents were given different goals appropriate to their role. For example, Dario Amodei, the CEO of Anthropic, balanced a desire to recruit for Anthropic, taking the opportunity to fundraise, to push for his vision of safety, as well as decide whether to aim to become the new CEO of a combined entity.
The interesting part of the simulation is that the LLM knows who the different players are, given that they’re all relatively famous people. It can guess how they will behave in a given situation, and what could unfold turn by turn as they try to outwit each other in a boardroom fight.
“It’s like a video game in that turn by turn, they’re making choices across different axes, and then they’re reacting to each other,” Saatchi said. “A choice that someone makes in turn seven can lead others to react in turn eight. There’s an adjudicator agent, who is like a dungeon master. That agent decides who won each round and who’s ahead, and then who decides at the end, wins as the most effective agent in the war game.”
Humans have what we call internally “the shadow,” or the other side of themselves and their personalities. The characters can feature aggression, paranoia, ambition, deception and more. When you mix together a bunch of different personalities, you can get a variety of outcomes in the simulations.
“We noticed LLM design isn’t based on decision making, which is really important for gaming. It’s based more on personality. And if you want to have a strategy game, nobody really cares about your personality. They care about your decision making. How are you under pressure? What have you done over the last 20 years that would give you a feel for what they might do in the future?”
Are simulations the future of gaming?
Saatchi thinks that AI agents acting within simulations are the future of gaming.
“We are building on the shoulders of giants with Demis’ work on Republic The Revolution, Joon Park’s Generative Agents paper and the recent work of Altera in Minecraft” said Saatchi said.
“Our theory is that the future of games and storytelling is simulations. If you wanted to build both The Simpsons game and The Simpsons TV show, you would, in the future, build Springfield, and that would then generate for you episodes of The Simpsons that would generate for you games and places to explore within Springfield as a game.”
He added, “You can tell many different stories within tribulations, once you get those simulations properly working. And we’ve got an alpha where people are uploading themselves to San Francisco as characters, telling stories, telling their own story.”
And he said, “You would build Springfield, and then you can guide what might happen in Springfield and say what might happen in Springfield, or you could just let it generate itself. It’s a pretty big mind shift of how entertainment, games and shows will be made in the future.”
Saatchi noted that AI researcher Noam Brown did a fascinating experiment with the game Diplomacy. He and other researchers “obtained a dataset of 125,261 games of Diplomacy played online at web Diplomacy.net.” Of those, 40,408 games contained dialogue, with a total of 12,901,662 messages exchanged between players. Their aim was to train a human-level AI agent, capable of strategic reasoning, by playing games of Diplomacy.
“We were really inspired by how he did that. He had countries and we were adding into the mix different personalities with particular positions. We liked the idea of a very compressed timeline,” where the whole scenario would play out quickly and over and over again, Saatchi said.
There has been a rich history of work in simulations in both the games industry and beyond. Demis Hassabis, who founded Deepmind (acquired by Google) and who recently won the Nobel Prize in Chemistry 2024 for computational protein design, actually began as a video game AI designer. Hassabis worked extensively with Peter Molyneux on several games which include simulation elements such as Theme Park, Black & White and Syndicate.
Hassabis also started his own company to make Republic: The Revolution. It’s a political simulation game in which the player leads a political faction to overthrow the government of a fictional totalitarian country in Eastern Europe, using diplomacy, subterfuge, and violence. According to Hassabis, Republic: The Revolution charts the whole of a revolutionary power struggle from beginning to end.
Your job is to kind of take over the Soviet Republic as either a union boss or a politician or a police officer or a journalist, and it’s got full day-night cycles. It raises the question of how you have a 3D world where agents live and whether proximity to each other plays a role.
For the Sim Francisco OpenAI project, it illustrated the potential for a power struggle against AIs.
Saatchi said the above examples shows how game technology often serves as the breeding ground for radical new ideas and as a jumping off ground for AI research. For example, one of the leading engineers on Deepmind AlphaFold started their career as an AI programmer on The Sims.
Richard Evans’ GDC talk on The Sims 3 — the researcher went from programming AI for The Sims to Deepmind in a reversal of Demis Hassabis’ journey from games to founding Deepmind.
Evans GDC Talk, Modeling Individual Personalities in The Sims 3, is very influential talk. He went on to join Deepmind after working on The Sims. The gaming world and the AI world have significant overlap that is a potential area for further academic research, Saatchi said.
One of Saatchi’s options is to let players loose with the simulations, creating their own, and then uploading the stories that are told through the simulations.
Saatchi has done some other experiments with AI-generated South Park episodes and AI characters battling each other in a Westworld setting.
“It felt like six seasons of Game of Thrones in five days, because it was the most powerful position in the most powerful industry in the world,” Saatchi said. “There was also a lot of faith that this person would be guiding us into a new era of super intelligence. You could say it wsa the most important person in the history of the planet.”
President Trump and the Taiwan invasion
Next, Fable intends to run a Sim Washington DC-based simulation around a future President Trump’s responses to a Chinese invasion of Taiwan.
As a next project to test out SIM-1’s decision making framework, Fable intends to test out a one-week period of buildup and conflict between Taiwan, China and the United States under President Donald Trump.
Fable has interviewed several Pentagon war games organizers to get a feeling for the strengths and weaknesses of the current Taiwan scenario.
Fable is building agents representing Chinese leader Xi Jingping, Cai Qi (first ranked secretary to the secretariat of the Communist Party), Chinese defense leader Dong Jun, Chinese premier Li Qiang, Taiwan’s leader Lai Ching-Te, Japan’s leader Shigeru Ishiba, UK prime minister Keir Starmer, French President Emmanuel Macron, Russia’s Vladimir Putin, North Korean leader Kim Jong Un and Elon Musk.
With this set of characters, the simulation would determine whether the war would happen and how would each major player act during such a crisis. All of these characters are known personalities.
“It allows you to see how powerful AI has become at like projecting outcomes,” Saatchi said. “It moves us out of this boring world of dumping an LLM into an NPC. You can talk to the tab and keeper for 40 hours. Nobody wants to do that. What we want is highly sophisticated, aggressive agents that we could play against, but also that we can, like, watch and understand what’s going on in that world.”
Many of the war game simulations are aimed at how to avoid a war, perhaps through forming alliances or other maneuvers that drive up the cost of war.
“We think the more realistic we can make our AIs, the more entertaining they will be,” Saatchi said.
Source link lol