Building Believable NPCs: An Architecture for Emotion, Intent, and Reaction

Matthias Gall

·May 22, 2026·15 min read

Imagine standing in VR, face to face with a character. She's not a companion following quest markers or a vendor cycling through dialogue options. She makes eye contact. Her gaze tracks yours naturally. When you step closer, her expression shifts subtly, acknowledging the proximity. When you speak, she doesn't just wait for her turn to reply. She reacts.

That character is Olivia Rhodes from Lone Echo (2017) and Lone Echo 2 (2021), created by Ready at Dawn before the studio closed in 2024, when it still had a dedicated AAA team and the budget to match.

Video Preview — Short by @MyPerseverance

We're building foundational NPC systems for reuse across immersive XR products, so we studied what studios like Ready at Dawn achieved and asked:

What's the minimum architecture needed to create that sense of presence within our own means?

Our answer is several independent subsystems feeding into a shared priority pipeline, so no single controller has to own the whole character.

The Challenge

Most NPC systems in games are built around one thing: a behavior tree that decides what to do next. The NPC patrols. It fights. It takes cover. Maybe it says a line of dialogue when you walk past.

What makes Olivia Rhodes feel alive is the gap the opening scene illustrates: macro decisions matter less than micro-behaviors: gaze, expression, the sense that she reacts.

In a flat-screen game, you can hide a lot. Stiff animations, dead eyes, robotic gaze: camera distance forgives all of it. In VR, you're standing next to these characters. Every missing micro-behavior isn't just noticeable. It's uncanny.

So the question became: how do you build an NPC that doesn't just do things, but feels like it's experiencing them? Proximity raises the bar, and that's the constraint the architecture below is built around.

The Architecture

Three layers:

Layer 1: The Behavior Tree (Dispatcher)

A shared NPC state (idle, combat, follow, dialogue, and so on) drives a behavior tree that delegates to subtrees. Idle triggers affordance scanning or wandering. Combat triggers cover-seeking and shooting. Follow keeps the NPC near a target with distance thresholds that resist flickering when the player moves.

Here's the important bit: the behavior tree decides what to do. It does not decide how to look while doing it. It doesn't control the face. It doesn't control the eyes. It doesn't control the idle glances. Those settle later, once behavior and world context have fed their inputs in.

Layer 2: Smart Objects (Affordances)

Between strategy and expression sits the world itself. Scene-placed SmartObject components carry tags ("PointOfInterest", "Cover", "Heal", "Comfort"), priority values, and action sequences (Enter, Run, Exit). The behavior tree's current state determines which tags get searched. An idle NPC looks for "Comfort" objects. A combat NPC looks for "Cover."

The scoring formula is simple: priority minus distance. One priority point equals roughly one meter of walking effort. A high-priority object that's far away competes with a low-priority object that's nearby. The NPC makes a trade-off, implicitly, every time it scans.

The interaction flow is async: navigate to the object, run Enter actions, run Run actions (interruptible if something more important comes up), then run Exit actions. A navigation timeout prevents NPCs from getting permanently stuck trying to reach something they can't path to.

This is ecological psychologist James J. Gibson's affordance theory applied to game development. Objects don't have behavior. They have affordances. What the NPC does with them depends on what it needs right now.

Choosing a chair to sit in, a cover point to duck behind, or a point of interest to glance at all register as inputs downstream. The intent pipeline doesn't ignore them - it competes with every other provider for the same channels.

Layer 3: The Intent Pipeline (Arbiter)

The intent pipeline is where everything converges. Behavior, affordances, emotion, dialogue, damage: each subsystem writes to its own channels; priority resolves the result into what the player actually sees. AnimationIntents is a struct-based bus with channels like UpperBody, FullBody, Face, Look, and Eyes. The look, emotion, and blink controllers each own a channel; others fill the rest.

Each channel uses last-writer-wins by priority. If two providers want the upper body in the same frame, the higher-priority one wins. The bus handles the merge.

When dialogue starts, the player registers as a high-priority look target: gaze locks on, idle scanning stops. When the NPC walks, the path ahead becomes the look target until arrival, then attention returns to the room. Unregistering a target is enough to release control, with no hand-off code between systems.

Deep Dive: Emotion

The emotion system borrows heavily from Russell's Circumplex Model of Affect: emotional states as two axes, arousal (calm to activated) and valence (unpleasant to pleasant). It's a useful simplification, not a literal implementation. Nervousness (0 to 1) and Happiness (-1 to 1) are our game-friendly versions of those axes.

They drive facial blendshapes or animator controller parameters through a data-driven EmotionProfile. The mapping is completely designer-tunable: positive happiness might drive a smile, high nervousness might drive raised brows, and so on. Different NPC types can have entirely different profiles. Same code, different data.

The system runs in two layers:

Mood is continuous. The two axes drift over time. When something happens (damage, dialogue events, scripted triggers), the values shift. Then an optional recovery mechanism gradually drifts them back toward configured baselines. An NPC that was shot stays nervous for a while. It doesn't snap back to calm the moment the combat state ends. The recovery is slow and feels natural.

Transient Emotes are one-shot facial expressions triggered by name: "Surprise", "Pain", "Flinch". Each has an attack-sustain-decay envelope (the same shape as ADSR in audio). Higher-priority emotes override current ones. A pain emote will interrupt a dialogue expression. When the pain emote decays, the underlying expression returns.

Here's where it gets interesting for procedural tension. Blink rate is derived directly from arousal. A calm NPC blinks every few seconds. A terrified NPC blinks rapidly. Idle look frequency scales the same way: the nervous character scans the room constantly, checking corners and doorways. The calm one glances around occasionally, relaxed.

Deep Dive: Look System

The look controller implements a multi-source priority cascade. Three tiers:

Override (highest): scripted look targets from cinematics or sequences
Registry targets: any registered object, which includes dialogue partners, navigation waypoints, and scene objects marked as interesting
Idle behavior: random look-around, frequency driven by the Nervousness axis described above

A few implementation details that matter more than you'd expect:

View cone filtering. While navigating, the NPC only considers look targets within ±90° of its forward direction. It doesn't snap its head backward to look at something behind it. It has to turn first.

Post-navigation cooldown. After arriving at a destination, there's a brief window where the look system doesn't switch targets. This prevents the jarring "arrive, immediately lock onto the nearest interesting thing" behavior that makes NPCs feel robotic.

Hysteresis in body orientation. The body orientation controller turns the character's torso toward high-priority look targets. But it uses two thresholds: engage at a wider angle, release at a narrower one. If the target is just barely to the side, the body doesn't twitch back and forth. It commits to the turn, then holds it until the target moves significantly away.

Dialogue-aware gaze isn't special-cased. In VR, the player's head position is registered as a look target. That's it. The same priority cascade that handles combat targets and idle wandering also handles eye contact during dialogue. The VR player's head is just another entry in the registry.

Deep Dive: Damage Cascade

This is the hardest case to appreciate from text alone. You want to be in VR, arm's length from the NPC when it gets hit. Most of our B2B work never needs combat; the cascade is still the stress test for composition.

One damage event is broadcast to every listener. This is the same decoupling idea as the intent pipeline, but for events rather than per-frame channels. Seven providers react independently:

Mood shift. Happiness drops. Nervousness spikes. The facial expression changes, and the blink rate increases. The character now looks shaken, and will stay that way until mood recovery gradually pulls it back.
Pain emote. A transient facial expression fires at high priority, overriding whatever the face was doing before. The attack envelope is fast (immediate grimace), the sustain holds briefly, then the decay lets the underlying mood expression resurface.
Pain sound. A pain groan plays through the NPC's mouth audio source, the same one used for speech. The sound varies with damage intensity: a glancing hit gets a different vocalization than a heavy blow.
Ragdoll physics impulse from impact direction and magnitude in the damage data.
Blood VFX: particles at the wound plus a bone-parented decal that grows and fades with the body.
IK damage reaction. Hands reach toward the hit; a data-driven map ties wound position to IK targets, with elbow hints so the arm doesn't fold unnaturally.
Impact look target. A temporary look target is registered at the wound location. The NPC's gaze shifts to where it was hit. After a few seconds, the target is unregistered, and gaze returns to whatever was important before.

The following video shows how these all play together.

Seven providers, one event. The NPC winces not as a single "hit reaction" clip, but as face, eyes, body, hands, and attention composing genuine pain. Mood recovery (already described above) does the rest over the next ten to fifteen seconds.

That's what "alive" feels like.

Where It Shines, Where It Stretches

New subsystems slot in cleanly: implement an interface, set a priority, and the bus handles the rest. Emotion profiles, hit-to-effector mappings, and SmartObject components are all data-driven, so designers can tune behavior in the editor without touching code.

The architecture assumes the assets underneath are good. Blendshapes, locomotion clips, and rig setup have to be done properly in DCC tools first. The framework can arbitrate well, but it can't rescue a face that wasn't sculpted for expression, clips that weren't authored for layering, or EmotionProfile tuning that only works on one mesh.

The remaining gaps are tuning, not design. Affordance scoring doesn't yet account for line-of-sight or social context. Occasionally the visual behavior tree in Game Creator and the code-driven subsystems need careful coordination to stay in sync.

Composition as a Feature

What the framework deliberately doesn't provide is meaning: no world model, no memory, no reasoning about the future. The NPCs don't need to understand why they're doing something; they need to look like they do. Body language, emotional texture, and reactivity live in the framework; purpose and story live in the application. That separation is what makes it reusable across the XR work we build.

Work with Us

Believable NPCs are not only for games. Trade show guides, training simulations, and product demos all gain something when a character holds eye contact, reacts to proximity, and recovers from events instead of snapping back to idle.

If you're planning an XR experience where presence matters as much as functionality, feel free to reach out. We're always happy to explore how this fits your project or build a prototype together.

Assets used

The NPC framework for Unity uses the following paid assets from the Unity Asset Store:

Character Customizer and Character Customizer Modern Clothes Pack by Jordbugg for the human character model
Guardians Bundle I by Dmitriy Dryzhak for the troll model
Game Creator 2 by Catsoft Works for general triggers and actions, behavior trees, and dialogue
FinalIK by RootMotion for full body biped IK
Pupett Master by RootMotion or Ragdoll Animator 2 by FImpossible Creations for ragdoll physics
SALSA LipSync by Crazy Minnow Studio, LLC for facial animation
VR Interaction Framework by Bearded Ninja Games for VR input (optional)

For the demonstration, we're also using free assets from Sketchfab:

"Low-Poly Arrow v2.0" (https://skfb.ly/oLS6V) by Teslov is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/).