Beyond the virtual dollhouse: simulating life in games


newsletter
Newsletter

Our guest contributor Ran Mo talks about using AI to simulate life in video games. As a former product lead at EA, he worked on a classic in the field: The Sims. Now he wants to push the boundaries.

The simulation of life, friendships, and companionships has been a holy grail in video games. From simple implementations in Tamagotchi and Pokémon, to the complex lives of The Sims, the incorporation of virtual companionships have deeply touched millions of gamers and formed the backbone of some of the most enduring franchises.

At its core, the process of creating digital companions is also a quest to better understand the nature of sentience. And as we shall see, the techniques used will also have wide ranging applications beyond gaming.

As technology, in particular AI, becomes more powerful, new opportunities open to reimagine digital life and companionships. This essay is divided into two parts. Part 1 traces through some of the most important historical milestones in simulating life digitally. Part 2 explores our efforts at Proxima in furthering this pursuit. Let’s get started!

Ad

The starting point: Scripting “life” in video games

The starting point of modern video game programming is scripting. Scripting is a broad umbrella term that encapsulates many concepts, from very simple programs to complex decisions trees and state machines. Yet at the heart of it, scripting is less about “true intelligence”, and more about deterministic responses that follow a set of predefined rules— essentially digital versions of choose-your-own-adventure books.

Despite their mechanical nature, scripting can be incredibly powerful in creating immersion. Mass Effect and Dragon Age, two popular franchises from BioWare, use scripting to create deep relationship opportunities with player companions. Depending on their choices, players can unlock backstories, affect game outcomes, and even form romantic relationships with the digital companions. The popularity of the two franchises are a testimony to the power of human-created immersive storytelling.

The challenge with scripting is ultimately one of scalability. Designers not only need to design each interaction by hand, but also to account for every possible permutation of player choice. This means content cost scales exponentially to player experience. Consider the following: a player chooses from three different options for a particular interaction. Based on his choice, three new options open up, and so on, for a total of 30 choices throughout the game. This decision sequence (assuming no overlaps) would require more pre-programmed scenarios than grains of sands on Earth! Clearly, a different approach is needed to build immersion at scale.

The Sims and utility-based AI

I had the opportunity to work on The Sims franchise at EA, and it was incredible to see the passion that the franchise instills. Today, more than 70 million people play The Sims. The fourth installment of the game has grossed over $2 billion dollars, and it’s still growing in popularity.

At the heart of the franchise are the Sims– autonomous digital companions with their own needs, preferences, and desires. Players can control them from time to time, or build for their broader environments. But these agents are also perfectly capable of running their own lives. In contrast to the pre-planned and scripted stories of Mass Effect, The Sims emphasizes the emergent narratives that form through these autonomous companions. In simpler terms, the Sims are a simulation of life.

Recommendation

Elder Scrolls mod. This approach is appealing because it’s relatively easy to envision and implement: hook up a chatbot to a game avatar, integrate speech recognition and text-to-speech, add a healthy dose of game lore, et voilà you have a bona fide talking NPC!

But such implementations are fairly shallow and not true simulations of life. The game simply acts as set dressing for the chatbot, and the novelty of such experiences can quickly wear off.

In contrast, a deeper implementation is the Minecraft Voyager project, in which an LLM-powered agent explores the Minecraft world and learns skills without human intervention. The agent proposed its own tasks, built its own knowledge library, and used those learnings to further its discoveries. Without human guidance, Voyager made sense of the Minecraft world, built its own house, and eventually mined diamonds.

Two things stood out to us: the agent’s ability to make sense of its world, and its ability to form long term memories through experience. What if we could harness those abilities not as an autonomous game agent, but rather to better simulate life and companionship?

Lumari prototype

As a starting point to what we aim to achieve, consider a very small moment with a dog named Nemo.

  • Perception: Nemo sees an unknown, scary-looking person approaching its owner
  • Input: The owner shouts loudly and waves her arms around
  • Memory and Personality: Nemo remembers that he is very protective of his owner, and that he’s fearless when the owner is under threat.

In an instant, Nemo interprets all this and makes his decision. He springs into action, jumping between his owner and the interloper, and growls menacingly–ready to attack. In the aftermath, Nemo is appreciated for his bravery and rewarded with a treat, reinforcing his behavior.

But what if Nemo were not fearless but cowardly? Would he choose to bark from a distance instead? What if the interloper were actually a friend that the owner was excited to see? Would Nemo be scolded for growling at a friend, and if so, would he remember for the next time? Such emergent moments highlight the nuances of real life relationships that can’t be pre-programmed. Yet these moments are also what makes companions feel real and authentic. We believe modern technology has advanced to a point in which we can begin tackling such nuanced relationships.

Many modern AI models rely on a neural network architecture known as transformers. Through its attention mechanism, transformers excel at making sense of context and dependencies across large and disparate data sources. In simulating life in games, these data sources could represent memory, perception, user commands, and more. To better understand this, let’s recast Nemo from a real dog into a virtual companion.

  • Perception: We built a system that converted the 3D game world into natural language in real time, so that Nemo can “perceive” his world around him at any given time.
  • Memory, personality, intention: stored and interpreted digitally (as vector files), and continuously evolve through new experiences, just like in real life.
  • User input: We added speech recognition for player voice commands. But these could easily also be control inputs or in any other form.

We included below a demonstration of the prototype.

To enable the aforementioned scenario, we apply a first layer of a large language model to translate “perception-to-intention” by taking inputs across perception, memory, user commands, and other cues. In the case of Nemo, the output would look something like “Oh no, my owner is in danger. I need to protect my owner!”

But this intention is not yet a game action. To achieve this, we need to introduce a second layer of LLM to translate “intention-to-action”: converting the intention into executable game commands in real time. This second layer is particularly difficult because it needs to understand the range of executable actions in the context of its intentions; any incorrect commands could crash the game. So here, we also added a third layer of AI system to self-correct any failure in logic and game state changes in real time.

Finally, we added a “real-time learning by association” system that commits observations and outcomes to memory, so that each action influences part of Nemo’s long-term memory, and affects the outcome of future decisions. We believe this ability to continuously learn will be a central part of future life simulations.

One more note: we built Nemo separately from the world. Nemo perceives, interprets, and learns from the world around him in real time, just as we do as players. This is distinct from the traditional approach to NPCs, which are built as ‘part of the world’. Nemo’s architecture “frees” him from his environment, and abstracts him to traverse with players across new experiences—opening up opportunity to myriad first-party and player-created adventures in the future.

Implications and the future

The simulation of life and companionships within games have important implications. Commercially, it has led to some of the most enduring and profitable franchises, like The Sims. For players, these companions have the capacity to deepen engagement within games. Beyond gaming, these pursuits also symbolize deeper approximation of human relationships and experiences.

To be clear, there are still many challenges and unsolved elements—and many pieces of puzzle not yet built. At the same time, the pace of technical innovations has been breathtaking to see: within weeks of Meta’s open source foundational model, researchers have trained light-weight, application specific models that perform at the highest levels.

Frontier models and technology are only part of the answer. To create truly emergent and immersive experiences, game makers need to marry innovative technology with deep artistry. At Proxima, we’re excited to push those frontiers in building the next generation of interactive experiences. We’re still early in that journey, and there’s a lot more we’re aiming to build. We believe it’s better to learn together than alone. So if you’re also researching or building against this space, we’d love to hear from you. If so, please reach out!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top