Full-cycle agentic development: from brief to deployment
TL;DR — Key takeaways
- Agentic development requires a strict cycle: brief → spec → implementation → verification → deployment
- Memory Bank — a structured knowledge base about the project; without it, the model loses context
- Vertical slices + CLI testing replace brittle browser E2E tests
- Verification is not an optional stage, but mandatory hygiene at every step
- Real result: an HR system of 25,000 lines of code in 2 business days
Introduction: why agentic development is not just “asking ChatGPT to write code”
Full-cycle agentic development is radically different from the familiar “write a prompt — get code” workflow. It is a systematic process in which the AI agent becomes a full participant in development — from refining requirements to deployment and production verification.
Denis, an experienced developer and author of the Dexd Notes Telegram channel, built such a flow in practice and implemented a 360 HR assessment system in 2 days — 25,000 lines of code from scratch. This article provides a detailed breakdown of each stage of his approach.
This article is for those who have already tried working with Claude, Codex, or other agents and want to move from random experiments to a reproducible, predictable result.
Contents
1. Stage 1 — Brief: how to formulate the task correctly
2. Stage 2 — Memory Bank: the project's long-term memory
3. Stage 3 — Specification and vertical slices
4. Stage 4 — Verification and testing
5. Stage 5 — CLI interface instead of browser tests
6. Stage 6 — Staging and deployment without risk
7. Code review and simplification
8. Real-world case: HR 360 in 2 days
9. FAQ
Stage 1 — Brief: how to formulate the task correctly
A brief is the boundary between Problem Space (what is desired) and Solution Space (how to implement it). Most developers skip this stage or do it too quickly — and then wonder why the agent did “the wrong thing”.
Iterative brief refinement
A proper brief is born through several iterations. The algorithm is as follows:
1. Write the initial wish in free form
2. Ask the agent (or GPT in the web chat) to identify contradictions and gaps — what has not been specified sufficiently
3. Add the missing details to the original prompt
4. Repeat 5–10 times until the model starts asking about insignificant details
Important technical trick: instead of continuing the conversation (thereby growing the context), edit the original message. In Codex and Claude Code, double-press Esc to edit the previous message. In Obsidian, it is convenient to maintain a “living prompt” that grows with each iteration.
Why is this important? The beginning of the context is the most “valuable” area for language models. Every token there affects the entire session more strongly than tokens in the middle. Garbage at the start of the context = degraded quality throughout the whole workflow.
Structure of a good brief
A good brief contains two sections:
- Domain — what we are building, the subject area, users, business logic
- Technical — stack (language, framework, DB, hosting), architectural constraints, preferences
Tip: use a mainstream stack. TypeScript + Node, PostgreSQL, Next.js, Vercel — this is not pedantry, but pragmatism. The models are trained on a huge amount of precisely such code and work with it much better than with exotic solutions.
Stage 2 — Memory Bank: the project's long-term memory
Memory Bank — a structured knowledge base about the project in the form of files in the repository. Without it, each new agent session starts “from scratch,” and it knows nothing about previous decisions, architecture, or contracts.
Why context warming is needed
Before starting work on a task, the agent must “warm up” — study the Memory Bank. Warm-up stages:
1. General warm-up — the agent reads the project index and understands the structure as a whole
2. Specialized warm-up — the agent studies the specific subsystem it will work with
If there is no Memory Bank, use several prompts to ask the agent to study the project top to bottom and wait until it returns with a ready understanding of the structure.
Memory Bank structure based on C4 principles
An effective Memory Bank is built on the principles of C4 documentation (Context → Container → Component → Code) and includes:
index.md— entry point, system overview/specs/— subsystem technical specifications/plans/— epics, features, implementation plans, test reportsmemory-bank-bible.md— structuring principles (single source of truth, atomicity, progressive disclosure)
Key principles of Memory Bank:
- No duplication — every fact is stored in one place
- Cross-references between documents
- Large files are split into parts
- After each feature — documentation update
Stage 3 — Specification and vertical slices
After the brief, the agent turns wishes into a set of epics and features. But the key architectural decision here is organizing the code around vertical slices.
What is a vertical slice
A vertical slice is a unit of value that cuts through all layers of the system from top to bottom: UI → API → business logic → database. For an online store, this could be “add a product to cart” or “get personalized recommendations”.
Advantages of the approach:
- Code is grouped around a function, not a technical layer
- The agent understands exactly what it is implementing and how to verify it
- Each slice can be tested independently
- Memory Bank stores the contract for each slice
Planning without overengineering
One of the chronic problems of modern LLMs (Claude 4, GPT-5.2, GPT-5.3) is the tendency to overcomplicate. Models wrap simple things in unnecessary abstractions and build extra structures.
In the planning prompt, explicitly write: “use the simplest implementation method, without overengineering”. This is not a request, but a requirement — without it, the model will overcomplicate by default.
Stage 4 — Verification and testing
Verification is the central theme of the approach. It is built into every stage, not set aside as a separate final phase.
Testing hierarchy
Acceptance Test (vertical slice) ← most important
↑
Integration Tests
↑
Unit Tests
↑
Type checks + linter + build
Principle: unit tests are hygiene — mandatory, but auxiliary. The main test is the one that checks whether the vertical slice works end to end, in accordance with business requirements.
BDD scenarios as acceptance tests
For each vertical slice, a BDD scenario is defined in Given/When/Then format:
Given: the user exists in the company HR directory
When: the user attempts to sign in
Then: sign-in succeeds
Given: the user is absent from the directory
When: the user attempts to sign in
Then: the system returns a 403 error
Critically important: the scenario must cover the entire scope of the feature, including all edge cases and error handling. If one scenario is not enough, create several. An agent that is not given feedback through tests invents the system’s behavior in error situations — and invents it unpredictably.
Agent test as the highest level of verification
For complex integration scenarios (when external systems, the browser, or multiple services are involved), an agent test is created: a separate agent session with a prompt describing:
- What the agent should do
- What behavior is expected
- What counts as a deviation and how to record it
Such an agent runs the scenario through the browser or CLI and reports the results. This automates what QA used to do manually.
Stage 5 — CLI interface instead of browser tests
One of the most practical insights: browser automation via Playwright under agent control is unreliable. The agent scrolls slowly, gets lost in interfaces, and cannot reliably click elements.
Solution: shared client module + CLI
The architecture is built like this:
┌─────────────────────┐ ┌─────────────────────┐
│ Graphical UI │ │ CLI interface │
│ (Next.js / React) │ │ (command line) │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
└──────────┬─────────────┘
│
┌──────────▼──────────┐
│ Shared client.ts │
│ (entire API logic) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Backend / API │
└─────────────────────┘
Key principle: the CLI is a thin shell over the shared client module. All business logic for interacting with the backend lives in the shared library. The CLI only translates commands from the terminal into calls to that library.
Advantages of CLI testing
| Criterion | Browser test | CLI test |
|---|---|---|
| Speed | Slow | Fast |
| Reliability | Fragile | Stable |
| Maintenance | High cost | Minimal |
| Logic coverage | UI only | Entire client layer |
The CLI supports two output modes: human-readable (for the developer) and JSON (for the agent). This allows the agent to receive a deterministic, machine-readable response about the result of each operation.
Stage 6 — Staging and risk-free deployment
Letting the agent loose on the production database with broad tasks is roulette. The agent can always make a mistake: cleaning not only test data, but also a production table.
Three environments at minimum
Local (development) → Beta (staging) → Production
- Local — for rapid iterations without pushes
- Beta — for agent tests, a full copy of production
- Production — only through GitOps promotion from beta
Technically: separate subdomains (beta.yourdomain.com), separate physically isolated databases. The agent works only in the beta environment.
Deployment verification
After each deployment — an automatic check: the agent opens the deployed version through a browser or CLI, runs a smoke test with a test user, and confirms it works.
Without this, there were cases where beta was fully tested, the deployment went “successfully” — but production had a bug because of a database migration. Deployment verification is mandatory.
Code review and simplification
After implementation and verification comes the code review stage. For production code (not demos), this is mandatory.
Main focus: simplification
Models tend to make code more complex — this is a systemic trait, not a bug. Claude 4 and GPT-5.x add unnecessary abstractions, wrap simple things in complexity, and create long call chains.
Why simplification is critical: complex code is harder for the model to understand in future sessions. Models poorly understand the virtual state of a system if it goes deeper than the second level of nesting. Every extra abstraction layer is a potential error on the next enhancement.
A good rule: “if this code can be written more simply without losing functionality — then it should be written more simply”.
Additional review aspects:
- Security: no unsecured endpoints, no SQL injection
- Typing: full use of TypeScript without
any - Best language practices: clean Biome/ESLint
Real-world case: HR 360 in 2 days
Task: HR 360-degree evaluation system — employees evaluate each other, the system generates questionnaires and manages the company structure.
Stack: TypeScript, Node.js, Next.js, PostgreSQL (Supabase), Vercel
Tools: OpenAI Codex (GPT-5.2 / GPT-5.3, high reasoning), ChatGPT for the brief
Result:
- Start: Tuesday afternoon
- Done: Wednesday evening
- Code volume: 25,000 lines
- Feature coverage: almost all epics including UI
Process by stages:
1. The brief was iteratively refined in ChatGPT (~10 iterations of editing the initial message)
2. The agent (Codex) received the brief + the Memory Bank Bible structure → generated epics and features
3. Each feature was implemented in a separate session (20–50 minutes per feature)
4. Tests via the CLI interface, screenshots as proof of scenario completion
5. UI references from Figma/Ticha for the visual part
Main takeaway: without an orchestrator, it’s a hassle. Each feature has to be launched manually. The next step is an autonomous orchestrator that runs all features overnight without developer involvement.
FAQ
What is end-to-end agentic development?
End-to-end agentic development is an approach in which an AI agent (Claude, Codex, GPT) participates in all stages: from clarifying requirements and writing specifications to code implementation, testing, and deployment. The developer sets the direction and controls the results without performing routine operations manually.
How does Memory Bank differ from a regular README?
Memory Bank is a living, structured knowledge base with cross-links, document templates, and organizational principles (C4 model, progressive disclosure). It is updated after every feature. README is a static document for humans. The agent reads Memory Bank at the start of each session to restore project context.
Why use a CLI interface if there is Playwright?
Playwright under agent control is slow and unreliable — the agent gets lost in interfaces, cannot scroll, and misses elements. The CLI interface tests the same client logic as the UI, but deterministically and quickly. A shared client module ensures that the CLI and UI use the same contracts.
How can you check that the agent did not cut corners in implementation?
Verification against the original specification is a separate mandatory stage. According to experiments with Claude 4 Opus and Sonnet, up to 15% of the original plan may not be fully implemented. After implementation, explicitly ask the agent: “compare the result with the feature plan and specify what was not implemented.”
Is staging needed for small projects?
As soon as the project has at least one external user, staging becomes mandatory. An agent with broad tasks (testing, deployment, database work) can make a mistake and delete data. Isolation of the beta environment is the only reliable protection.
Which model is best for agentic development?
For planning and briefs: GPT-5.2 Pro (if available) or GPT-5.3. For code implementation: GPT-5.3 with high reasoning in Codex. Claude is better for creative tasks and discussions, but requires more “guardrails” to follow the plan precisely.
Summary and checklist
End-to-end agentic development is not magic and not a replacement for a developer. It is a system that allows one person to produce an amount of code that previously required a team.
Key conditions for the system to work:
- High-quality brief with iterative refinement
- Memory Bank as the project's long-term memory
- Vertical slices with BDD tests
- CLI interface for reliable testing
- Verification at every stage, not just at the end
- Isolated beta environment
Agentic flow checklist
- [ ] The brief has been refined through 5+ iterations, all gaps closed
- [ ] Memory Bank created and warmed up at the start of the session
- [ ] Each feature is a separate vertical slice with contracts
- [ ] BDD scenarios cover the happy path and all edge cases
- [ ] CLI interface implemented on top of the shared client module
- [ ] Tests run, screenshots/logs as evidence
- [ ] Verification against the original plan — explicit check
- [ ] The Beta environment is isolated from production
- [ ] After deployment — automatic smoke test
The material was prepared based on practical experience presented during a live stream on the AI Driven Development channel together with Denis (Telegram: Dexd Notes).