AI-Driven Development in game development: how to integrate AI agents into a game development team in 2025–2026

TL;DR: AI agents are no longer an experimental toy — they have become full-fledged “digital employees” that write code, conduct reviews, and support architecture. But without proper management, documentation, and a culture shift, the agent will turn into a generator of technical debt. In this article — a practical guide to AI-Driven Development for game dev teams: from choosing tools to building a feedback loop.

What AI-Driven Development is and why it matters right now

AI-Driven Development (AIDD) is an approach to software development in which AI agents become full participants in the process: they write code, fix bugs, generate tests, maintain documentation, and conduct reviews. This is not autocomplete in the editor and not a chat with GPT — it is delegating full-fledged engineering tasks to autonomous systems with access to the codebase, terminal, and external services.

According to the GitHub Copilot Business Report 2024, developers using AI assistants complete tasks 55% faster. But the real speedup depends on the type of task: on new greenfield projects, the gain can reach 70–75%, while on legacy code — 20–30%. This is confirmed by McKinsey’s study “The economic potential of generative AI” (2023), which recorded a productivity increase for developers in the 20–45% range.

Anton Kerp, CTO of Funzen and Games with nearly 10 years of experience in commercial development, put this idea precisely: “Neural networks can already really be considered your junior employees. Earlier this might have sounded like loud clickbait, but today it is already reality — at least in development and code writing”. His company practically does not write code manually: Claude, Codex, and MiniMax replace part of the team, while being paid in subscriptions rather than salaries.

Why exactly now? Technologies have reached the threshold where ignoring AI costs more than using it. The labor market is already reflecting this shift: employers are increasingly demanding AI tool skills, and the teams that manage to build processes today will gain a competitive advantage before the technology becomes the standard for everyone.

Basic concepts: LLM, agents, context, tokens

Before implementing AI-Driven Development, the team needs to understand the basic concepts. Without this foundation, it is impossible either to properly assign a task to an agent or to evaluate the quality of its work.

What an LLM is and why it is an “amplifier”

LLM (Large Language Model) is a probabilistic model that predicts the most likely next word. It has no “understanding” in the human sense: every response is generated from scratch and is guaranteed to be unique even with the same prompt. Two important consequences follow from this:

An LLM can hallucinate — confidently report incorrect facts, nonexistent APIs, invented libraries. Never trust the output without verification.
An LLM is an amplifier of your competencies: if you know the domain, the model will help you do more and faster. If you do not know it, it will amplify your ignorance as well. First, get at least a basic grasp of the topic, otherwise it will be impossible to distinguish fiction from reality.

Agent vs. chat: what is the fundamental difference

An agent is an LLM that has levers to act on the environment: it can read and create files, run commands in the terminal, call external APIs, and interact with the browser. A regular chat is only a text dialogue without access to real tools.

It is the agentic mode that makes AI-Driven Development possible: you set the task at the requirements level, the agent independently studies the codebase, writes code, runs tests, fixes errors, and reports the result.

Context and tokens: the economy of attention

The context window is everything the model “sees” at the moment of generating a response: your prompt, chat history, files you provided, logs, and screenshots. A token is a unit of text processing (approximately 3–4 characters in Russian, 4–5 in English). The larger the context, the more expensive the request and the slower the response.

Practical implication: keep the context clean. If the model gave a bad answer, do not continue the dialogue with corrections — this pollutes the history. Roll back to the previous message, edit the prompt, and repeat the request. Clean context means a fast and cheap agent.

An important detail about the Russian language: LLMs are trained predominantly on English. Instructions in English are understood more precisely and consume about 1.5–2 times fewer tokens than equivalent text in Russian. This does not mean you should switch to English completely, but system prompts and rules for agents are better written in English.

AI development tools: from chat to CLI agents

AI development tools can be divided into several levels based on how deeply they are integrated into the workflow.

Level 1: web chats (an outdated approach)

ChatGPT, Claude.ai, Gemini — you generate code in the browser and copy it back into the IDE. This is already an outdated approach, but many teams still work exactly this way. The main drawback is that the agent does not see your project context and starts “from scratch” every time.

Level 2: IDE plugins (Cursor, Windsurf, Rider AI)

The next step is an IDE with built-in AI. Cursor, Windsurf (formerly Codeium), and JetBrains Rider with AI Assistant add an agent panel directly into the editor. The key advantage is codebase indexing: the IDE converts the entire project into a vector representation and allows the agent to search by meaning rather than by character matches.

Cursor and Windsurf are VS Code forks with an added agent panel. In Cursor there is an Agent mode, where the right side shows a review area for the agent’s changes: you see every modification before it is accepted. Rider takes a different path — it is a JetBrains plugin with a Marketplace for connecting external agents (OpenCode, Codex), although stability is still at beta level.

Level 3: CLI agents (a revolutionary shift)

CLI agents are the main thing that changed the industry in 2024–2025. Claude Code, OpenAI Codex, and OpenCode run directly in the terminal. Why this matters:

No intermediaries between your model subscription and the tool — no need to pay third-party aggregators
Runs everywhere: in any terminal, in a Docker container, on a server via SSH, in a CI/CD pipeline
Headless mode: the agent can be invoked from the command line with a prompt, you get the result, and integrate it into scripts or orchestrators
Sandbox: you can run the agent in an isolated Docker container with full privileges, without fear of damaging the main system

OpenCode is a special case: it uses a client-server architecture (the server runs on your machine, the terminal is the client), which makes remote connections possible without additional tools. OpenCode supports virtually any model from any provider and has a GUI with project file browsing and built-in Git Worktrees support.

Level 4: CoWork solutions and GUI wrappers

In 2025, a movement emerged to create graphical interfaces for terminal agents. The first was Claude CoWork — essentially the same Claude Code in a pretty wrapper. Then OpenAI, MiniMax, and the open-source community created their own versions. IONUI is a universal GUI that works over the MCP protocol and supports any agent installed on the computer; it includes Telegram integration for working through the messenger.

Speech transcription: dictate tasks to the agent

Many IDEs support voice input natively. Claude Code recently added microphone support directly in the terminal. For other agents, you can use third-party tools: both cloud services (high quality, paid) and local models like Whisper (free, work without a GPU, take from 100 MB to 3+ GB depending on the model). Local Whisper recognizes Russian, English, and mixed-language speech very well, and adds punctuation.

Models and pricing plans: how not to overpay

Choosing a model is one of the key decisions when implementing AI-Driven Development. There is no universal “best” model: each has its own strengths, speed, and cost.

Top models for development (2025–2026)

Claude (Anthropic) — strongest in context understanding, architectural decisions, and following instructions. The leader for planning and design tasks.
OpenAI Codex / GPT series — high-quality code, especially with the arrival of GPT-5, which became a “turning point” for the entire industry. Codex is specialized specifically in code.
MiniMax — the leader in price/performance among Chinese models. According to Anton Kerp: “We’ve actually bought MiniMax for the whole company: we distributed it to developers, game designers, testers — we installed agents for everyone”. For tasks based on specifications, it performs no worse than top models, while being significantly cheaper.
Qwen (Alibaba), Kimi (Moonshot AI) — strong alternatives, offering subscriptions with access to several models at once.

Coding plans vs. tokens: when each is more cost-effective

Traditional pricing plans sell a “basket of tokens” per month. Coding plans (introduced with Claude Code in 2024) work differently: they provide a request limit within time windows — 5-hour, weekly, or monthly. If you use agents heavily (several sessions a day), a coding plan will be much more cost-effective. If you run an agent once a week, a traditional token-based plan is cheaper.

The leaders in coding plans are Anthropic (Claude) and OpenAI (Codex). Chinese providers offer higher limits at a lower price, but the top Western models still win on the quality of architectural decisions.

Rule for choosing a model by task

Use the most expensive, powerful model (Claude, GPT-5) for planning, architecture design, and complex debugging. For routine coding to specification, writing tests, refactoring — take cheaper models (MiniMax, Qwen): they do just as well and cost several times less.

Principles of working with agents: prompting and context engineering

Effectiveness when working with AI agents is determined above all by the ability to set tasks correctly. This is not about “typing the right words into a search bar” — it is a full-fledged engineering discipline.

Prompting as the foundation

Although prompting has become less hyped on social media, its relevance has not decreased. A high-quality prompt is the foundation of working instructions, system rules, and documentation for the agent. Key prompting techniques:

Chain-of-Thought — ask the model to think out loud before the final answer
Few-Shot — provide 2–3 examples of the desired result
Role Prompting — “You are a senior Unity developer with experience in mobile game design...”
Negative Instructions — clearly specify what should not be done
Structured Output — ask for output in a specified format (JSON, markdown, table)

Context Engineering: managing the agent’s context

Context Engineering is a higher level than prompting. It is the discipline of managing what exactly enters the agent’s context window at any given moment. It includes:

Selective file transfer: do not dump the entire codebase into the context — pass only the files relevant to the task
Structured documentation: architectural decisions, team conventions, and the technical stack should be described so that the agent can read and apply them on its own
Controlling “clutter”: remove failed responses from the context (rollback, not continuation of the dialogue)
Search tools: connect vector search over the codebase to the agent so it can find the needed files itself

Managerial approach to agents

Anton Kerp formulated a key insight: specialists with experience in management and mentoring learn to work with agents faster than developers without such experience. A manager expects problems and thinks about how to prevent them. A developer without management experience expects magic from pressing one button — and gets disappointed.

The task of a good manager when working with an agent is to create an environment in which the agent has:

Clear, unambiguous, understandable requirements
Access to all necessary information
Checklists for verifying completeness and correctness
A system of constraints within which the agent cannot “act incorrectly”

This is exactly what a good manager does for human employees. The tools are different — the principles are the same.

Memory Bank: managing project knowledge

Memory Bank is a methodology for structured management of project documentation specifically for working with AI agents. Every time you start a new session with an agent, it “forgets” everything from the previous one (unless otherwise provided for). Memory Bank solves this problem.

What to store in Memory Bank

Create a separate directory in the project (for example, memory/ or .ai/) and collect there all the meta-information that is not available to the agent from the code:

Architectural decisions: why a particular structure was chosen, which alternatives were rejected
Team conventions: code style, naming conventions, and patterns adopted by the team
Technical stack: dependency versions, external services, infrastructure
User stories and tasks: feature descriptions with context and priorities
Workflows: step-by-step instructions for typical operations (deployment, review, creating a new feature)
Known issues: technical debt, known bugs, limitations

Memory Bank workflow

Session start: the agent reads the Memory Bank, updates the context
Working on a task
Session end: the agent updates the Memory Bank with new decisions and artifacts

This is a two-way process — the agent not only reads the Memory Bank, but also updates it. This is the key difference from static documentation. There are ready-made frameworks for implementing a Memory Bank (for example, Claude Code has the CLAUDE.md convention), but the structure can be built independently.

Diagrams and Mermaid

Modern LLMs work excellently with diagrams and schematics. Mermaid is especially well supported — a text-based diagram markup language. Describe the architecture in Mermaid inside the Memory Bank: the agent will be able to “see” the structure of your project as a dependency graph, and IDE plugins will render the diagrams nicely for humans.

Project architecture for AI development

The agent works well in a clear, isolated structure. It works poorly in tangled spaghetti code with unpredictable dependencies. If there is no architecture, the agent will only make things worse.

Tree architecture as the standard

The optimal architecture for AI-Driven Development in game projects is tree-like: all dependencies are arranged in a tree from the root node. The deeper you go down the tree, the more high-level the features become. Example for a Unity project:

EntryPoint
├── Core Architecture (State Machine, DI Container)
│   ├── Auth Module
│   ├── Core Gameplay
│   │   ├── Player Controller
│   │   ├── Level Manager
│   │   └── Physics System
│   ├── Meta Gameplay
│   │   ├── Shop
│   │   ├── Offers
│   │   └── Progression
│   └── Infrastructure
│       ├── Analytics
│       └── Remote Config

Key principles:

Each node-module is independent: the Shop module does not know about the Offers module directly — they interact only through the parent
Isolation makes it possible to work in one node without fear of breaking others
DI frameworks (Zenject, VContainer) build exactly this kind of dependency tree — an additional argument in favor of using them

Division of responsibility: architecture for developers, features for agents

This is a fundamental boundary in AI-Driven Development:

“The trunk of the tree” (architecture) — the developers' responsibility. The agent should not touch the architectural core without an explicit assignment.
“The leaves” (specific features) — the agents' responsibility. This is where they work autonomously.

Anton Kerp gave a telling case: an agent worked for 3 hours on a new feature based on a detailed specification and delivered a working result — but during the work it encountered technical debt in the architecture. Not being tasked to fix it, the agent started bypassing the debt with hacks, creating new ones. The result was working code that could not be integrated and had to be rewritten manually.

Technical debt in the “trunk” is a disaster in AI development. The agent will spread it throughout the entire project. Keep the architectural core strictly clean.

Feedback Loop: the key to an agent's autonomous work

Feedback Loop is one of the most important elements of AI-Driven Development. It is what determines how autonomously an agent can work without your involvement.

What a Feedback Loop is in development

The scheme is simple: the agent works → interacts with the environment → receives feedback → adjusts its actions. Without feedback, the agent is “blind”: it does not know whether the code is written correctly, whether it compiles, or whether the tests pass.

Sources of feedback for the agent:

Compiler and linters — immediate reaction to syntax and static errors
Unit tests — verification of logic correctness
Logs and run results — the application's dynamic behavior
Static analyzers — code quality, potential vulnerabilities
Screenshots and browser data — visual result for UI tasks

Why tests are mandatory

Agents can generate unit tests quickly and well. That is valuable in itself — but the main thing is that tests close the feedback loop: the agent writes code, runs tests, sees failures, and fixes them. Without tests, the agent does not know whether it broke something, so every change requires manual verification. In AI-Driven Development the presence of tests is not optional, but basic infrastructure.

Automating checks

The ideal feedback loop for agent-based development:

The agent writes code
The compiler / linter runs automatically → the agent sees compilation errors
Unit tests run → the agent sees failures
Integration tests run → the agent sees system issues
If everything is green — the agent reports the task as complete

Setting up a full-fledged feedback loop is a serious engineering task that depends heavily on your stack and project. But even a minimal setup (compiler + basic tests) multiplies the agent's autonomy.

Working with agents as a team: adoption and resistance

The most difficult aspect of AI-Driven Development is not technical, but human. Introducing AI tools into a team inevitably meets resistance.

Typical resistance and how to handle it

Developers, testers, artists, and other line specialists often meet AI with distrust, skepticism, or outright refusal. Management, as a rule, is positive-minded — it does not need the potential explained. But the real work is done by line specialists.

What works:

Don't pressure people; let them try it — choose a task that the person themselves considers boring (merge conflicts, writing tests, documentation). After the first successful experience, questions disappear.
Show numbers — concrete comparisons of “with agent / without agent” on real tasks are more convincing than any presentation.
Normalize pilot failures — agree in advance that “failing” during the pilot rollout is normal and valuable. Otherwise, the team will hide problems so as not to look incompetent.

Cultural shift: engineering culture grows

The paradox of AI-Driven Development: agents do not just speed up work — they raise the engineering culture of the team. Agents force you to:

Spend more time on design before coding starts
Write detailed documentation (an agent works poorly without documentation)
Discuss implicit agreements that used to exist “in the heads” of senior developers
Interact more closely with game designers and other departments when defining requirements
Create more automation tools — agents do this quickly

Rule of code ownership

Critical rule: the commit author is responsible for the code, regardless of who wrote it — a human or an agentWithout this rule, developers stop reading the agent’s code before merging, which sooner or later leads to incidents in production.

Role distribution in an AI team

With AI-Driven Development, the developer’s focus shifts: less code writing → more design, management, and review. This requires rethinking the role: the developer becomes closer to an architect and technical manager than to a “coder.”

Parallel work by agents

One significant gain is that several agents can work in parallel on independent tasks. With a tree-like architecture, isolated modules can be developed simultaneously. This multiplies team throughput without hiring new people.

Agent settings: Rules, Skills, subagents, protocols

A mature AI-agent setup in a team includes:

Rules — system instructions that the agent always follows: code style, forbidden patterns, mandatory checks
Skills — specialized instructions for specific types of tasks: “how to create a new UI component,” “how to write a unit test,” “how to conduct a PR review”
Subagents — specialized agents for specific tasks (testing, documentation, review) that the orchestrator calls as needed
Interaction protocols — standards for passing tasks between agents (for example, MCP — Model Context Protocol, an open Anthropic standard)

AI and Unity: the current state of integration

For game dev teams working with Unity, AI-agent integration has its own specifics. The biggest problem remains interaction with the editor.

Current state

Integrations exist, but they leave much to be desired. The MCP server for Unity (Unity MCP) allows an agent to interact with the editor via the MCP protocol: create GameObjects, enter Play Mode, call editor functions. But stability and capabilities are still limited. Many teams still build their own custom solutions for more convenient agent work with Unity.

What works well

Writing C# code from a specification — agents handle this excellently
Generating unit tests for game logic
Creating ScriptableObject configurations
Refactoring and optimizing code
Documenting systems
Writing editor tools (EditorWindow, PropertyDrawer)

What still requires manual involvement

Working with scenes — placing objects, configuring components in the Inspector
Debugging visual/physics issues (the agent does not “see” the scene)
Configuring animation state machines in Animator
Working with the shader system

Best practice for Unity + AI

Structure the project so that as much game logic as possible lives in plain C# (without MonoBehaviour wherever possible). The less code is tied to the Unity API, the better the agent handles it and the easier it is to write tests.

Risks and pitfalls of AI-Driven Development

AI-Driven Development is a powerful tool, but when used incorrectly, it becomes a source of serious problems.

Technical debt in architecture

The risk already described above: the agent does not fix technical debt that is not part of the current task — it works around it. By working around it, it creates new debt. Keep the architectural core clean.

“Neuroslop”: generating garbage without control

If you give the agent a task that is too broad or poorly formulated, without clear constraints, it can generate a huge amount of formally working but low-quality, duplicated, excessive code — “neuroslop.” If not caught in time, it buries the project.

Accumulation of contextual “garbage”

Long conversations with failed attempts, fixes, and clarifications reduce the agent’s performance: the context becomes polluted, and the model starts to get confused. Roll back failed answers; do not continue them through dialogue.

Confidentiality and security

Using cloud models raises concerns about IP leakage and confidential data. Review the provider’s privacy policy before handing proprietary code or data to the agent. For sensitive projects, consider local models or enterprise plans with guarantees of no training on your data.

Dependence on the agent reduces the team’s knowledge

Developers who stop writing code themselves gradually lose deep knowledge. This creates a risk: if the tool is unavailable or produces an error, no one can figure it out without it. Maintain basic manual development skills in the team.

FAQ

What is AI-Driven Development in simple terms?

AI-Driven Development is an approach in which AI agents become full participants in development: they write code, fix bugs, generate tests, and maintain documentation, working autonomously or under minimal guidance from a developer. The developer, in turn, shifts from writing code to design, task setting, and review.

Where should you start when introducing AI agents into a game dev team?

Start small: choose one type of routine task (writing unit tests, documenting code, creating editor tools) and run a pilot with volunteers from the team. Install Claude Code, Codex, or OpenCode, choose a suitable model, and give them a specific, well-described task with clear acceptance criteria. Iterate, document the results, and expand usage gradually.

Does a developer need to know prompting?

Yes. The quality of the agent’s output depends directly on the quality of the task. Prompting is an engineering discipline that includes writing clear requirements, knowing how to provide the right context, and formulating criteria for checking the result. A developer who cannot assign tasks to an agent gets a poor result — not because the agent is bad, but because the task is bad.

How do AI agents affect game development speed?

The impact is uneven. For tasks involving code writing from a ready specification, the acceleration is real and significant (on new projects — up to 2-3x). For tasks with a high share of design and architectural decisions, the acceleration is smaller. For tasks with poorly described requirements, the agent may slow the work down. The overall effect for a team that has set up its processes correctly is a substantial increase in throughput without a corresponding increase in headcount.

What is a Memory Bank and why is it needed in game dev?

A Memory Bank is a structured project documentation system that the agent reads at the beginning of each session and updates upon completion. In game dev, this is especially important because of the specifics of game projects: complex architectural decisions, nonstandard patterns, game mechanics with non-obvious dependencies. The Memory Bank allows the agent to work with your specific project rather than with an “average Unity project from the internet.”

How do you ensure the quality of code written by an agent?

Several levels of control: automated checks (compiler, linter, static analyzer) provide immediate feedback; unit tests verify logic; mandatory review by a developer before merging provides human oversight. The rule “the commit author is responsible for the code regardless of who wrote it” is fundamental. Never merge an agent’s code without reading it.

Which models are best suited for Unity (C#) development?

Claude 3.5+ and GPT-4o / Codex show the best quality for complex architectural tasks in C#. MiniMax is a good alternative for specification-driven tasks at a significantly lower cost. For simple repetitive tasks (test generation, creating DTO classes), any of the modern models will work.

Resources:

Documentation from the talk: funzen.slite.page — AI Notebook
Author's blog on Telegram: @aks2dio
McKinsey: The economic potential of generative AI
Anthropic Model Context Protocol: modelcontextprotocol.io

AI-Driven Development for Game Teams