AI for Game Devs: Using LLMs to Generate Dynamic NPC Dialogue Trees
AI for Game Devs: Discover how LLMs are revolutionizing NPC dialogue generation in games. Explore hybrid architectures, local model integration, and practical frameworks for dynamic, memory-aware conversation systems.
The End of Static Dialogue
You approach a shopkeeper in an RPG. You have spoken to her before—helped her find a lost shipment, asked about her family, even haggled over a rare sword. In a traditional game, she would repeat the same three lines regardless. Her memory resets every time you walk away.
But what if she remembered? What if her tone shifted based on your history? What if you could ask her anything, and she would respond in character, grounded in the game’s lore, without repeating herself?
This is the promise of LLM-driven NPC dialogue—and it is no longer experimental. From indie developers on Reddit building local chatbots in Unity to academic researchers presenting at IEEE conferences, game creators are actively integrating large language models into their dialogue systems .

But there is a catch. The developers who have spent months building these systems are also the first to warn: plugging a chatbot into your game is not a design solution . The magic is not in the LLM itself. It is in the architecture that surrounds it.
This guide explores how to build dynamic, emotionally intelligent NPC dialogue systems using LLMs—without losing narrative control.
The Three Approaches to LLM Dialogue
Academic research has formalized the spectrum of LLM integration into games. A 2025 study from the IEEE Conference on Games introduced Quest of Aivengarde, a custom RPG that implements three distinct LLM-driven dialogue systems alongside a traditional tree for direct comparison .
Approach 1: Rephrasing (Lowest Risk)
The LLM does not generate content. It rephrases pre-written lines. The game still uses a traditional dialogue tree for logic and branching, but the LLM adds variety to the delivery.
What it looks like: The developer writes “I need your help finding my lost ring.” The LLM generates three variations: “I’ve misplaced my ring—could you assist?” or “My ring is gone. Please, help me find it.”
Best for: Reducing repetition without losing narrative control. Characters feel less robotic, but the designer retains complete authority over what information is conveyed.
Trade-offs: Minimal risk of hallucination or off-character responses. No change to game logic. The LLM acts as a paraphrasing engine, not a decision-maker.
Approach 2: Branching with LLM Guidance (Balanced)
This is the sweet spot for most commercial games. The developer still defines the dialogue tree structure—the nodes, the branches, the possible outcomes. But the LLM decides which branch to follow based on the player’s input and the NPC’s personality.
How it works: The player types or selects a response. The LLM analyzes that input, consults the NPC’s personality parameters, and determines which branch of the pre-written tree to activate. The actual dialogue text is still authored, but the path through the tree is dynamic .
Example: A guard NPC has three possible responses to a player approaching: hostile (if reputation is low), neutral (if reputation is average), or friendly (if reputation is high). A traditional system checks a reputation variable. An LLM system infers the appropriate tone from the player’s previous interactions and the guard’s personality description.

Best for: Open-world RPGs, immersive sims, and any game where player choice should feel consequential.
Approach 3: Fully Open-Ended (Highest Risk)
The LLM generates both the decision and the dialogue content in real time. There is no pre-authored tree. The NPC responds to any player input, within the bounds of a character prompt.
What it looks like: The player types “Tell me about the history of this town.” The LLM, given the NPC’s backstory and the game’s lore, generates a unique response. The player can ask follow-up questions. The conversation can go anywhere.
Best for: Experimental narrative games, AI-driven chatbots within games, or sandbox experiences where unpredictability is a feature.
Trade-offs: This is where things get dangerous. Without careful constraints, LLMs can hallucinate lore, break character, or produce responses that contradict game state. Developers experimenting with this approach consistently report one finding: you need structure around the model .
The Hybrid Architecture: Best of Both Worlds
The most successful implementations do not choose one approach. They combine them in a pipeline architecture .
The Modular Dialogue Pipeline
A 2025 thesis from the University of Tartu documented a modular system for LLM-augmented NPC dialogue that has since gained attention in game development communities . The system processes player input through a series of modules:
| Module | Function |
|---|---|
| Preprocessing | Normalizes and summarizes player input |
| Dialogue Flow | Determines current dialogue state using FSM or GOAP |
| Personality | Adjusts instructions with behavioral traits |
| World State | Adds factual context from game environment |
| Generation | Synthesizes final NPC response |
Each module enriches a shared Data Container passed through the pipeline. The LLM only touches the final stage—after the system has already decided what the NPC should communicate. The LLM decides how to say it.
Tool Calls for Actions
One developer experimenting with LLM agents in a custom RPG framework discovered that tool calls are more reliable than asking the LLM to output JSON . Instead of hoping the model formats its response correctly, the system presents available actions as functions the LLM can invoke.
Example: The LLM does not generate “I will move to coordinates (15, 32) and pick up the herb.” Instead, it calls move_to(entity_id=9) and pick_up(entity_id=9) as separate tool invocations. The game server executes these actions deterministically. The LLM handles what to do; the game handles how.
This separation of concerns is critical. As one developer noted: “If you can already code something using normal logic and systems, then using an LLM for that is probably the wrong move” .
Memory: The Missing Piece
The most common complaint about LLM-driven NPCs is forgetfulness. A character remembers the player’s name in one conversation, then asks for it again five minutes later .
The solution is a structured memory system with three layers :
| Memory Layer | Duration | Content | Storage |
|---|---|---|---|
| Short-term | Current conversation | Last 5-10 exchanges | In-memory sliding window |
| Medium-term | Current play session | Key events, player choices | Session storage |
| Long-term | Across save files | Relationship status, completed quests | Database (SQLite) |
When a player returns to a game after three months, the NPC can say: “Good to see you again. Last time you helped me repair the water pump.” That level of persistence requires intentional engineering—it does not emerge from the LLM alone.
Implementation Options for Game Devs
You do not need to build from scratch. Several frameworks and tools have emerged in 2025-2026.
For Unity Developers: ChatLab
ChatLab is a Unity add-on that integrates LLM-driven dialogue with branching logic . Key features include:
- Seven dialogue templates for rapid prototyping
- Dynamic branching conversations using LLM simulation
- Local model support (Phi-3 mini, DeepSeek R1 1.5B) for offline use
- OpenAI integration for cloud-based models
- Automatic translation to other languages
Pricing: $39.99 on the Unity Asset Store (one-time purchase)
Limitations: Local models are CPU-only and can be slow. The developer notes that using ChatGPT is significantly faster than running models locally. Local models also occasionally forget details .
For Custom Engines: The Modular Dialogue System
An open-source Modular Dialogue System on GitHub implements the pipeline architecture described above . It features:
- JSON/YAML configuration for dialogue flow, personality, and world facts
- Pluggable modules (swap out LLM providers or state machines)
- Support for LLaMA 3 and other open-weight models
- Fact-grounded responses that respect game state
License: CC BY-NC-ND 4.0 (non-commercial, no derivatives)
For Azure Users: OpenAI Integration
Microsoft’s training module on emotionally intelligent dialogue trees demonstrates how to use Azure OpenAI for NPC dialogue generation . The approach emphasizes:
- Structured prompts that define character role, emotion, and narrative context
- Personality parameters that influence tone and response
- Testing and iteration to ensure logical flow and consistency
Example prompt structure:
Write a dialogue tree for a grumpy shopkeeper NPC who becomes more helpful if the player compliments their wares. Include three branching responses based on player approach.
The Voice of Experience: Lessons from Builders
Developers who have spent months building LLM-driven NPC systems have hard-won insights.
The Problem with Free-Form Chat
One developer who built a system to run local LLMs directly inside Unity (no APIs, fully offline) concluded: “The more I understand how these models work, the more I realize they might not fit where people expect” .
His critique is sharp but fair:
“I also write short stories, and I like things to be intentional. Every line, every scene has a purpose. LLMs tend to drift or improvise. That can ruin the pacing or tone. It’s like making a movie: directors don’t ask actors to improvise every scene. They plan the shots, the dialogue, the mood. A story-driven game is the same.”
The real value, he argues, is emotional engagement. You can spend hours talking to a character shaped to your liking. The model can remember what you said and know how to push your buttons. That connection is something traditional systems cannot easily replicate .
The Repetition Problem
Another developer experimenting with LLM agents in a custom RPG framework found that models repeat themselves and fall into loops . If something is shown in “recent memories,” the LLM simply does the same thing over and over without variation.
Proposed solution: A planning stage to keep the model on track. The LLM should generate a plan, then execute it step by step, rather than reacting to each moment independently.
The Personality Problem
The same developer noted that LLM-driven NPCs can feel “dry” and lack personality. This is likely a prompt issue—the character description is too shallow. A deeper character with a larger proportion of backstory in the prompt yields better results .
Designing for Intentionality
The consensus among experienced developers is clear: LLMs are not a replacement for authored content. They are an augmentation layer.
Use LLMs for:
- Paraphrasing to reduce repetition
- Dynamic branching based on player input
- Generating responses in character voice
- Creating emergent social interactions in sandbox games
Do not use LLMs for:
- Critical plot delivery (risk of hallucination)
- Puzzles with specific solutions (LLMs may invent impossible solutions)
- Any scenario where consistency across playthroughs matters
- Replacing well-coded game logic
As one developer put it: “If you can already code something using normal logic and systems, then using an LLM for that is probably the wrong move” .
Performance Considerations
Running LLMs locally in games introduces significant performance constraints .
| Factor | Impact |
|---|---|
| Model size | Larger models (7B+ parameters) require significant RAM and VRAM |
| Inference speed | Local CPU inference is slow (seconds per response) |
| Quantization | INT8 or INT4 quantization reduces memory by 60-80% with acceptable quality loss |
| Caching | Pre-loading responses for common player inputs reduces latency |
| Async loading | Loading models 0.5 seconds before dialogue begins masks startup time |
For mobile games, cloud-based LLM APIs are currently the only practical option. For PC/console, quantized local models (Phi-3 mini, Llama 3 3B) can work with optimization .
The Future: Emotionally Intelligent NPCs
Microsoft’s training materials emphasize that the goal is not just generating dialogue—it is creating emotionally intelligent responses .
Key techniques include:
- Internal reactions (pauses, hesitations, emotional language)
- Mirroring player emotion (responding to fear, anger, or joy with empathy or tension)
- Layering conflict and resolution (showing character growth across branches)
When a player has been kind to an NPC across multiple encounters, that NPC should remember. Their tone should warm. They might offer discounts, share secrets, or warn the player of danger. That arc cannot be scripted for every possible player—but an LLM, given the right memory architecture and personality parameters, can generate it dynamically.
Getting Started Today
Step 1: Start with Approach 2 (Branching with LLM Guidance). Do not jump straight to open-ended generation. Use the LLM to decide which pre-authored branch to follow based on player input and NPC personality.
Step 2: Implement structured memory. Short-term, medium-term, and long-term memory layers. The LLM should never be the only source of continuity.
Step 3: Use tool calls, not JSON parsing. Present available actions as functions the LLM can invoke. Let the game engine execute them deterministically.
Step 4: Test relentlessly. Simulate player inputs. Identify where the conversation breaks, confuses, or derails. Adjust prompts and add conditions.
Step 5: Start small. One NPC. One location. One quest. Prove the architecture works before scaling.
Frequently Asked Questions
Q: Do I need an internet connection for LLM-driven NPCs?
A: Not necessarily. Local models like Phi-3 mini or DeepSeek R1 1.5B can run entirely offline, though performance is slower than cloud APIs .
Q: How much does this cost?
A: Cloud APIs charge per token. A typical conversation might cost fractions of a cent. Local models have no per-use cost but require hardware capable of running them .
Q: Can LLMs handle multiple NPCs with different personalities?
A: Yes. The personality module injects different behavioral traits into the prompt. One developer’s system supports personality-aware responses that reflect each NPC’s identity .
Q: What about hallucinations?
A: Hallucination is a real risk. The hybrid architecture—using LLMs for rephrasing or branching rather than content generation—reduces this risk significantly .
Q: Is this ready for commercial games?
A: Yes. ChatLab is available on the Unity Asset Store for $39.99. The Modular Dialogue System is open-source. Major studios are experimenting with these techniques. But expect to spend time tuning prompts and building supporting infrastructure.