I came to design organically and without training or skills. As a millennial child growing up in the era of the tumultuous teens of the web, my adolescent explosion into the world of creation occurred at a time when the tools themselves were discovering new form and function. They were certainly not teaching application design in high school - at best, there was an “Information Programming & Technology” class, but I’d been programming since primary school.
I could make programs and web sites, but the user logs I was seeing for my creations were baffling. I did not understand why people clicked the things they did, typed the things they did. I tried things, watched what happened, guessed why, but consistently made fundamental attribution errors. I would add features but was often wrong about what the results would be and could not understand why. By the time I was 20, it became obvious to me that I needed to understand how people worked before I could ever hope to understand why they were doing the things that they were doing.
The more I studied psychology and learned about the brain, the more I came to understand that despite the differences in individual people, there are commonalities in how people work. By seeking to understand how people experience the things we make, we give ourselves formal tools to both predict how people will react, and to fix it when our design intent doesn’t match the effect.
Today we’re going to go through the State Machine Model — one such tool for understanding how people experience things and react — and learn how to use it in practice to solve game design problems both pre-emptively and reactively. With any luck, you’ll be ready when you hear someone in Insights mutter the most cursed sentence in game development:
“Why are players DOING that?”
Let’s start with a real world example we can learn from.
Elden Ring
Elden Ring was the first FromSoft game for many players. There is a tutorial that you get to by jumping in a hole early in the game. However, in the initial network test, they found that many players were missing the tutorial and immediately struggling to play because they never learned how to do basic combat. They knew FromSoft games were “difficult”, so they just assumed their struggles with combat were intended, and they were absolutely not. It was intended that new players would jump in the hole, but the hole was very deep and with no sense of how much fall damage they take, many people just didn’t think that was even an option.
This is a video of the first closed network test. From 4:59, you can see the player reads a message “The Cave Of Knowledge lies below”, has no idea what it means, can’t see the hole, and walks straight past the tutorial.
This is the version that was released on launch day with 1.0, and you can see how much this area has changed, and how much more it’s been flagged.
The basic work that was done here:
Added an NPC ghost that said “Brave tarnished. Take the plunge. Of learning, and remembrance. Recall the arts of war. And your warrior's blood.”
Additional lighting to let you see the lip of the hole.
(not visible in this pic) More rocks were added below the drop to imply that the jump could be made without dying.
This helped the problem, but didn’t fix it, and players took to social media to complain about not being taught the basic mechanics. When they found out they’d missed a tutorial, many were annoyed.
Some people read that text and didn’t think ‘tutorial in hole’. Other new people read it and they’d heard that FromSoft NPCs lied to you, and assumed it was a trick. Between vagueness and knowledge obtained outside of the game, they did not understand the choice being offered, and subsequently missed it.
Whenever there is a gap between the intent of a design and the effect of a design, we need to deconstruct the problem in a way that gives us tools to map both of those things so we can find what the gap was. The State Machine Model gives us tools to identify these gaps by modeling players as a kind of state machine.
What Is A State Machine?
A state machine is a representation of a process that is “stateful”. A stateful system has memory - past actions of the system can affect future actions of the system. Games can be considered to be state machines, as can the individual actors within them with their various AIs or health pools or etc. This is in contrast to a “stateless” system like a static web site where every click on a link returns a fixed web page, and it does not matter which order I click on them - the same input will always return the same output.
State machines are a common computer science problem, and so to model player behavior as a state machine, we need to track their personal states, the inputs they receive (images and sounds), the knowledge they hold, and understand the processes that happen in order for them to perform output (pressing buttons). Most importantly, the thing people are currently experiencing can affect this process, and also affect future responses to things that they experience.
The entire process a person goes through when receiving the output of a game looks something like this. This is a simplified version:
The first step in understanding this model is one of the first things we learn as children: discovering that people are different from each other.
Theory Of Mind
Before we can get to the model itself, we need to start with Theory of Mind. Theory of mind is, at its core, the fundamental understanding that different people hold different knowledge. Children generally begin developing an understanding of Theory of Mind around age five, and there is a very simple test you can perform that will determine whether a child has gained this understanding.
Acquire two children, somehow. Set them five feet apart, then show the first child this box, and ask them what they think is in it.
They will answer “Smarties, of course”, and probably feel quite proud of themselves.
Then, open the box to reveal a surprise - it is not in fact Smarties. It is full of pencils. Then ask that child what they think the second child, who has not seen inside the box, thinks is inside it.
If they answer “Smarties”, they have passed the test, having gained the understanding that another human who has observed different things can hold different knowledge based on those observations. If they answer “pencils”, they have failed, because they have not yet come to separate their consciousness from that of others.
This test triggers a knowledge change in a child that grants them understanding they previously did not have, and asks them to remember their previous state, then project that onto a second person.
But people are more complex than their knowledge, so Theory of Mind alone is insufficient for explaining player behavior.
While we like to think of ourselves as in control of our own minds, the classical conditioning of Pavlov’s Dog applies to people as well as it does animals - if you ring a bell every time before you feed a dog, its mouth will begin watering upon hearing the bell, because the brain has learned to associate the sound of the bell with receiving food.
In a past relationship, I lived in a small apartment with my now-ex, and our computer desks were behind the couch. She’d tap the back of my chair twice to let me know she needed to get past, and I’d pull my chair in. Years later, someone at work tapped the back of my chair twice to get my attention while I was coding, and I instinctively pulled my chair in without even thinking, because I’d unconsciously trained that reaction. I didn’t even realize I’d done it until they mentioned it later.
The most important thing here is that we don’t automatically know why we react the way we do. In the moment our brains and bodies simply react due to our past experiences. We can learn about ourselves, our patterns, our habits, and then apply that information to understand why we felt something, or did something, but it is a reverse-engineering process. And since we don’t inherently know why our brains respond the way they do, it means that we need to learn about ourselves through observing our own output, and the output of others, in order to understand why we react the way we do. And while everyone is different, there are commonalities to human experiences that we can use to help us make better designs.
Much like our Elden Ring new players in the network test, we need to watch how players act, react, and observe their output in order to understand why they reacted the way they did.
But in order to understand why people react, first we need to understand how they react.
Part 1: Stimulus and Perception
The Theory of Mind test has prerequisites - for example, knowing what smarties are (or at least being able to read), being able to see, and not having done the test before.
When we think about perception, we tend to think of merely seeing something and skipping straight to understanding, but there are two parts to it, which are demonstrated through two problems.
Imagine that we show two adult humans a painting. One art.
Problem 1: Bodies are different. You can show two different people the exact same picture, but differences in their physiology can mean that the image they receive is not actually the same.
Problem 2: Even if their eyes are identical, and the image their brain receives is the same, people’s brains filter the stimulus they receive differently. We will all notice different things about the painting based on our experiences.
Finally, we react to the thing we’ve seen. We might have learned something, or felt something, and our state changes based on the result of that.
In its simplest terms, the brain interprets input while also being changed by it. Two people with different experiences of the world have had their brains trained differently and thus will extract different versions of what is important in the raw input they have just seen, and react differently, and learn different things from their reaction.
An artist may see a painting and may instinctively notice the framing, the construction of the characters, and the use of light to guide the eye. A narrative designer who saw the exact same thing may not notice those implementation details but have a greater understanding of the vibe and what it does to a person.
Part 2: Response and emotional state
What if the child is familiar with the test? They may experience a sense of excitement at a problem they are aware of, and feel a burst of satisfaction, dispensing dopamine and other happy chemicals. A child unfamiliar with the test may experience similar excitement thinking they’re getting Smarties, followed by disappointment when they learn that there are no Smarties waiting for them.
The child, having perceived the input, has an unconscious and immediate reaction that is different depending on their past experiences. We can see again that how we’ve been trained can dictate the emotional response, which in turn affects our state, and how all of our internal mental processes behave.
This can be tested on almost any gamer of a certain age by playing them the scream of the Poison Head Crabs in Ravenholm from Half-Life 2, on any engineer who’s been on-call by playing the Pagerduty alert sound, or on any developer by playing them the tak-tak sound of an incoming Slack message. Before they’ve had a chance to think about what they’ve heard, they’re already experiencing stress, or fear, which by experiencing it again strengthens the brain’s pathway that triggered that response, and makes it easier to experience again.
Thing is, this response can affect a person’s physical state as well, and our physical state affects how our brains work. In the famous Capilano Bridge Experiment, researchers found that men who encountered an attractive woman in a situation where their heartrates were elevated were more likely to call her afterwards. The response to the situation affected their physicality affected their emotional state, which affected their behavior.
We can see on the diagram of the state machine that our Response flavors the current action, and future actions.
Part 3: Accessing and adding to knowledge
You need to know something in order to consciously analyze it. The act of perceiving something may add new knowledge to a person’s total knowledge pool, which then can be used for Analysis.
However, the Capilano Bridge Experiment shows, the Response step can give rise to emotional stimulation which affects the state of the brain, and again, colors what Knowledge can and will be accessed, and can change a person’s behavior. Stress often causes reduced access to memory, and thus the accessible knowledge may be reduced. In some people it can cause more acute memories. We always arrive at Knowledge, but how we arrive there is colored by our Response and what we already Know.
If I watch a movie I’ve already seen, I may find myself noticing things I didn’t notice the first time, because my Knowledge component isn’t as fixated on absorbing the core of the what’s being presented, and may let me focus on different details.
Part 4: Analysis
Now, having received input, translated that into brain input, felt a certain way about it, and had it collide with things we know, finally we arrive at Analysis. This too is not always a conscious process - we are not rational actors nor are we perfect information analyzing machines.
We’ve been trained by the things we’ve experienced in the past and our state has been flavored by the thing we’re currently experiencing with the context of those past experiences. When making games, we have players who usually have experience with other games that they will bring to ours. We have to be aware of conventions, because if you suddenly make a traffic light “red” mean “go” instead of stop, you’re asking people to unlearn a lifetime of “red light means stop”.
Part 5+: Action
Finally, our stimulus gets converted to action. If you’re playing Half-life 2 and you see a red dot on the ground near you, you know there’s a rocket launcher being trained on you, and might immediately begin moving the mouse to preemptively return fire or run for cover.
All of this happens automatically, and near-instantly. Depending on our level of immersion in the game (or whatever we’re experiencing), we’re taking action in sub-second intervals. In League of Legends you may be clicking six times per second, but the instant you hear the telltale sound effect of Morgana firing a Dark Binding (crowd control skillshot) at you, your next click is likely perpendicular, moving horizontally away to avoid it. You may even instinctively flash if your experience tells you that the binding is unavoidable, and that being hit by it will result in your death, so you need to use a high-cooldown blink to escape.
Here’s a hypothetical example in which we understand that some CC is coming at us, and we need to dodge it immediately.
Intent
What happens: My brain has output a course of action that I will now attempt to take. This is based on all of the previous steps and modifiers.
Outcome: “I need to flash to dodge this!”
Unconscious Actions
What happens: My body reacts to the input without me consciously planning to do so.
Outcome: Tensing my shoulders in response to the danger, perhaps gasping from surprise.
Instruction To Act
What happens: Conscious actions to convert Intent into Physicality
Outcome: Attempting to move the mouse perpendicular to the incoming CC, and attempting to move my finger from hovering E to pressing F. This may happen very quickly because I’ve flashed a binding a hundred times, or it may happen very slowly because I need to think about it because I’m new and this isn’t muscle memory yet
Action Taken
What happens: The combination of 2 and 3; what the body actually did. Because I’ve now moved my muscles, this can also affect the body’s physical ability to respond by tiring or training them, so I may respond differently in future intentionally or strain my wrist and slow future movement. Or my hands might just not do what I told them because I’ve been tensing for 20 minutes because it’s a stressful game ie. affected by past Responses.
Outcome: My right hand moved up and to the left. My left hand pointer moved from E to F and pressed the key to Flash out of the way.
Our final physical reaction is a byproduct of all of the states and processes up to and during the current thing that’s happening.
The Final States
If you’re still reading, good for you. This is an insane amount of detail into something most people do not simply ever think about, and if you’re still here you’re either really into understanding people, or really committed to making better video games. Good for you, probably.
Now that we’ve learned about what processes people go through:
Perception
Response
Analysis
Action
And, that there are three states that are affect those processes.
Physical
Emotional
Knowledge
Game Design For The Human State Machine
By focusing on which of these we plan to affect and when, we can design games that impact people. As a case study, I like to use a very simple game of mine: An Easy Fix.
An Easy Fix is a simple twine game about a very bad day that I had trying to fix a time-critical bug, and engineers all over the world have cursed at me for raising their blood pressure.
It has one input: choosing what to do or say in any given passage.
It has two outputs:
What happened based on my last choice
What time is it currently (ie. how much time is left before everything is ruined)
When a choice makes something happen, new knowledge is added to the player’s Knowledge pool. When a choice causes time to pass, the clock ticks up, closer to the deadline. This creates an Emotional reaction of time pressure, which affects Physicality via stress.
On my first pass making the game, it was frustrating (intentional) but kind of boring (unintentional). It didn’t reproduce the actual tension of that day. I realized that this wasn’t the fault of the writing, it was simply the missing the component that actually made the day tense: a countdown to failure against a task of unknowable complexity.
When I added the timer, everything changed, and suddenly it was doing exactly what I’d intended. I had only been adding to the Knowledge Pool, but I needed to be incrementally adjusting the knowledge pool to affect the Emotional state. When retroing this, I realized I could have added the slack message sound to stress people out further, which was my intended experience. It was a valuable lesson.
Training, Onboarding, and the First Time User Experience
The first time that you play Dance Dance Revolution, you are probably standing in the center of the mat, like the tutorial showed you.
When you see an arrow coming up the screen, you might notice that, realize you have to stomp on an arrow, look down to see where the pad is in relation to your foot, then move your foot to it, then move it back to the middle.
The tenth time, you might have realized it’s not tenable to make three movements per arrow, and instead start with your feet on the pads. You might be used to alternating which foot moves so it’s more like dancing, left right left right, rather than simply moving the same foot over as new players do.
The hundredth time you’ve played it, you might recognize entire patterns of moves which you know to be Ladders or Crossovers. You might be able to glance at this screenshot and know that Player 2 (right) is hitting the down arrow with their left foot, because their right needs to hit the next Up arrow in order to set up the walk for the rest of the pattern on-screen, otherwise they’d need to either hit two successive notes with their left foot and break their rhythm, or spin around backwards to continue alternating, which looks very cool but is harder.
Even the default ordering of songs in DDR is intentional - each subsequent song at each difficulty adds new Knowledge about these patterns and what they mean for us. There’s also an obvious Physical component - a new player who hasn’t had these muscles trained isn’t going to be able to jump into higher difficulty songs because they might just not be fit enough to keep up with the stomping requirements. On an individual song design, you need to think about which sections are uptime (high intensity), which are downtime (space to recover), how we’ll manage our player fitness, and when and where to test it.
An easy-to-read but high-step-count set of Ladders might push our player Physicality while groups of skipping with high variance in rhythm might test their Perception in mapping those to actions.
Helldivers 2
You might then load up Helldivers 2 and despite a lifetime of Dance Dance Revolution, struggle with reading lists of arrows that are horizontal instead of vertical. You might also make that conversion quickly, and find yourself calling down Stratagems faster than anyone else on your second session.
You can see pretty quickly that when we’re dealing with games, we’re almost always dealing with precedent, because most people who buy a game are not buying their first game, so they come with the knowledge they learned playing other games. You have to design both for players with different sets of pre-existing knowledge.
An Imaginary ARPG I Just Made Up: SwordJammers
Say you’re working on the initial design for an upcoming ARPG called “SwordJammers”. You want to use combos in a particularly new and innovative way, and you’re trying to figure out if it’s worth it. A player new to the genre may pick every version of how combos work at roughly the same speed because they have to learn something new anyway, but an existing player may have transferrable knowledge that speeds up the process.
That same existing player may experience friction between their existing knowledge if they’re used to them working differently, as now they have to adjust the muscle memory they’ve already built up in other games. This friction may be increased if SwordJammers is the third in a series, and you’ve spent two whole games training fans to play a certain way then pulled the rug out from under them. Figuring out what knowledge exists in the player and what their expectations are is part of the challenge of teaching people your game.
This becomes increasingly more complex when you realize that sometimes, training people to know how to do what they need to before they need to do it may not actually be the goal.
Say you have particularly complex combos that need to be performed quickly in certain situations. If those skills and responses haven’t been trained and ingrained in the player, they may not be able to perform those complex moves, and you might find out that in the moments they really NEED powerful combos, that most players are simply not using them because they don’t have the muscle memory built up. You may only discover this through telemetry when you realize they’re not leveling up those abilities because they just don’t use them.
So what do you do? You may decide that in the area before these become mandatory, you will put in a handful of enemies that encourage and repeat the action of inputting complex combos in low-pressure situations such that the brain and hands are ready to go when the moment calls for it. However, you might also decided the opposite - that it’s the concept of combos that need to be taught, and that the challenge is supposed to be looking up, planning out, and mastering the right ones.
Will I add Knowledge? Will I try to increase recognition during Perception? Will I train their Physicality to have muscle memory? Do I need to give them a rest so they’re relaxed Physically and not tested Analytically right before the big fight? Will I try to increase Analysis? Which of these systems are being tested, how much, and why?
There is no Right Answer that universally applies, because every answer produces a different experience. Deciding what experiences to create and figuring out how to create them IS the job.
Tracking The Human State Machine
Wherever you go, the sky is the sky, and people are people. All people are different, which means that creating a universal experience is impossible. This is part of the reason that I say writing well is more difficult than programming well, because programming has a predictable deterministic compiler, but writing must be passed through the ad-hoc conceptual maps of reality created for themselves by an infinite number of different people with different versions of the core parts of life training their brains while equally infinite numbers of good and bad things randomly occur.
If you are making a game for everyone, you are making a game for no-one. Your game has target demographics and it has people it’s designed to appeal to. While serving each of these cohorts you will make assumptions about their Knowledge Pool, their Emotional state, and their Physicality. If you are releasing a new MOBA, you can assume that some portion of players will be bringing their Knowledge and Physicality over from League of Legends and DOTA2, but you’ll also have new players unfamiliar with the genre. But for all of them, you can determine that by 40 minutes into the game when people are getting one-shot by assassins, they will be emotionally tense. You must onboard people from whatever states you decide are worth your effort to invest in.
Tracking this in an absolute, objective manner is impossible. You’ll never have the exact same response from any two people, but by speaking in generalities about the kinds of people, the demographics, the cohorts, you can begin to figure out how to create what you want to create. Games are a special case for this, because unlike a film or a song where you can at least be assured that everyone is experiencing roughly the same stimulus, player actions change what they see and hear. For most, it’s impossible to guarantee that any two players will even experience the same stimulus.
Everything we get our players to experience affects their knowledge pool, their emotional state, and often their physical state. We are training their brains to respond in the ways we want, to deliver the experience we want to create. The challenge, then, is to design our games and systems such that we induce the results we want in the people that we want them from.
And that, dear reader, is on you to solve.
// for those we have lost
// for those we can yet save