Full-cycle agentic development: from brief to deployment

TL;DR — Key takeaways

Agentic development requires a strict cycle: brief → spec → implementation → verification → deployment
Memory Bank — a structured knowledge base about the project; without it, the model loses context
Vertical slices + CLI testing replace brittle browser E2E tests
Verification is not an optional stage, but mandatory hygiene at every step
Real result: an HR system of 25,000 lines of code in 2 business days

Introduction: why agentic development is not just “asking ChatGPT to write code”

Full-cycle agentic development is radically different from the familiar “write a prompt — get code” workflow. It is a systematic process in which the AI agent becomes a full participant in development — from refining requirements to deployment and production verification.

Denis, an experienced developer and author of the Dexd Notes Telegram channel, built such a flow in practice and implemented a 360 HR assessment system in 2 days — 25,000 lines of code from scratch. This article provides a detailed breakdown of each stage of his approach.

This article is for those who have already tried working with Claude, Codex, or other agents and want to move from random experiments to a reproducible, predictable result.

1. Stage 1 — Brief: how to formulate the task correctly

2. Stage 2 — Memory Bank: the project's long-term memory

3. Stage 3 — Specification and vertical slices

4. Stage 4 — Verification and testing

5. Stage 5 — CLI interface instead of browser tests

6. Stage 6 — Staging and deployment without risk

7. Code review and simplification

8. Real-world case: HR 360 in 2 days

9. FAQ

10. Conclusions and checklist

Stage 1 — Brief: how to formulate the task correctly

A brief is the boundary between Problem Space (what is desired) and Solution Space (how to implement it). Most developers skip this stage or do it too quickly — and then wonder why the agent did “the wrong thing”.

Iterative brief refinement

A proper brief is born through several iterations. The algorithm is as follows:

1. Write the initial wish in free form

2. Ask the agent (or GPT in the web chat) to identify contradictions and gaps — what has not been specified sufficiently

3. Add the missing details to the original prompt

4. Repeat 5–10 times until the model starts asking about insignificant details

Important technical trick: instead of continuing the conversation (thereby growing the context), edit the original message. In Codex and Claude Code, double-press Esc to edit the previous message. In Obsidian, it is convenient to maintain a “living prompt” that grows with each iteration.

Why is this important? The beginning of the context is the most “valuable” area for language models. Every token there affects the entire session more strongly than tokens in the middle. Garbage at the start of the context = degraded quality throughout the whole workflow.

Structure of a good brief

A good brief contains two sections:

Domain — what we are building, the subject area, users, business logic
Technical — stack (language, framework, DB, hosting), architectural constraints, preferences

Tip: use a mainstream stack. TypeScript + Node, PostgreSQL, Next.js, Vercel — this is not pedantry, but pragmatism. The models are trained on a huge amount of precisely such code and work with it much better than with exotic solutions.

Stage 2 — Memory Bank: the project's long-term memory

Memory Bank — a structured knowledge base about the project in the form of files in the repository. Without it, each new agent session starts “from scratch,” and it knows nothing about previous decisions, architecture, or contracts.

Why context warming is needed

Before starting work on a task, the agent must “warm up” — study the Memory Bank. Warm-up stages:

1. General warm-up — the agent reads the project index and understands the structure as a whole

2. Specialized warm-up — the agent studies the specific subsystem it will work with

If there is no Memory Bank, use several prompts to ask the agent to study the project top to bottom and wait until it returns with a ready understanding of the structure.

Memory Bank structure based on C4 principles

An effective Memory Bank is built on the principles of C4 documentation (Context → Container → Component → Code) and includes:

index.md — entry point, system overview
/specs/ — subsystem technical specifications
/plans/ — epics, features, implementation plans, test reports
memory-bank-bible.md — structuring principles (single source of truth, atomicity, progressive disclosure)

Key principles of Memory Bank:

No duplication — every fact is stored in one place
Cross-references between documents
Large files are split into parts
After each feature — documentation update

Stage 3 — Specification and vertical slices

After the brief, the agent turns wishes into a set of epics and features. But the key architectural decision here is organizing the code around vertical slices.

What is a vertical slice

A vertical slice is a unit of value that cuts through all layers of the system from top to bottom: UI → API → business logic → database. For an online store, this could be “add a product to cart” or “get personalized recommendations”.

Advantages of the approach:

Code is grouped around a function, not a technical layer
The agent understands exactly what it is implementing and how to verify it
Each slice can be tested independently
Memory Bank stores the contract for each slice

Planning without overengineering

One of the chronic problems of modern LLMs (Claude 4, GPT-5.2, GPT-5.3) is the tendency to overcomplicate. Models wrap simple things in unnecessary abstractions and build extra structures.

In the planning prompt, explicitly write: “use the simplest implementation method, without overengineering”. This is not a request, but a requirement — without it, the model will overcomplicate by default.

Stage 4 — Verification and testing

Verification is the central theme of the approach. It is built into every stage, not set aside as a separate final phase.

Testing hierarchy


Acceptance Test (vertical slice) ← most important
    ↑
Integration Tests
    ↑
Unit Tests
    ↑
Type checks + linter + build

Principle: unit tests are hygiene — mandatory, but auxiliary. The main test is the one that checks whether the vertical slice works end to end, in accordance with business requirements.

BDD scenarios as acceptance tests

For each vertical slice, a BDD scenario is defined in Given/When/Then format:


Given: the user exists in the company HR directory
When: the user attempts to sign in
Then: sign-in succeeds

Given: the user is absent from the directory
When: the user attempts to sign in
Then: the system returns a 403 error

Critically important: the scenario must cover the entire scope of the feature, including all edge cases and error handling. If one scenario is not enough, create several. An agent that is not given feedback through tests invents the system’s behavior in error situations — and invents it unpredictably.

Agent test as the highest level of verification

For complex integration scenarios (when external systems, the browser, or multiple services are involved), an agent test is created: a separate agent session with a prompt describing:

What the agent should do
What behavior is expected
What counts as a deviation and how to record it

Such an agent runs the scenario through the browser or CLI and reports the results. This automates what QA used to do manually.

Stage 5 — CLI interface instead of browser tests

One of the most practical insights: browser automation via Playwright under agent control is unreliable. The agent scrolls slowly, gets lost in interfaces, and cannot reliably click elements.

Solution: shared client module + CLI

The architecture is built like this:


┌─────────────────────┐  ┌─────────────────────┐
│  Graphical UI       │  │  CLI interface      │
│  (Next.js / React)  │  │  (command line)     │
└──────────┬──────────┘  └──────────┬──────────┘
           │                        │
           └──────────┬─────────────┘
                      │
           ┌──────────▼──────────┐
           │  Shared client.ts    │
           │  (entire API logic)  │
           └──────────┬──────────┘
                      │
           ┌──────────▼──────────┐
           │  Backend / API      │
           └─────────────────────┘

Key principle: the CLI is a thin shell over the shared client module. All business logic for interacting with the backend lives in the shared library. The CLI only translates commands from the terminal into calls to that library.

Advantages of CLI testing

Criterion	Browser test	CLI test
Speed	Slow	Fast
Reliability	Fragile	Stable
Maintenance	High cost	Minimal
Logic coverage	UI only	Entire client layer

The CLI supports two output modes: human-readable (for the developer) and JSON (for the agent). This allows the agent to receive a deterministic, machine-readable response about the result of each operation.

Stage 6 — Staging and risk-free deployment

Letting the agent loose on the production database with broad tasks is roulette. The agent can always make a mistake: cleaning not only test data, but also a production table.

Three environments at minimum


Local (development) → Beta (staging) → Production

Local — for rapid iterations without pushes
Beta — for agent tests, a full copy of production
Production — only through GitOps promotion from beta

Technically: separate subdomains (beta.yourdomain.com), separate physically isolated databases. The agent works only in the beta environment.

Deployment verification

After each deployment — an automatic check: the agent opens the deployed version through a browser or CLI, runs a smoke test with a test user, and confirms it works.

Without this, there were cases where beta was fully tested, the deployment went “successfully” — but production had a bug because of a database migration. Deployment verification is mandatory.

Code review and simplification

After implementation and verification comes the code review stage. For production code (not demos), this is mandatory.

Main focus: simplification

Models tend to make code more complex — this is a systemic trait, not a bug. Claude 4 and GPT-5.x add unnecessary abstractions, wrap simple things in complexity, and create long call chains.

Why simplification is critical: complex code is harder for the model to understand in future sessions. Models poorly understand the virtual state of a system if it goes deeper than the second level of nesting. Every extra abstraction layer is a potential error on the next enhancement.

A good rule: “if this code can be written more simply without losing functionality — then it should be written more simply”.

Additional review aspects:

Security: no unsecured endpoints, no SQL injection
Typing: full use of TypeScript without any
Best language practices: clean Biome/ESLint

Real-world case: HR 360 in 2 days

Task: HR 360-degree evaluation system — employees evaluate each other, the system generates questionnaires and manages the company structure.

Stack: TypeScript, Node.js, Next.js, PostgreSQL (Supabase), Vercel

Tools: OpenAI Codex (GPT-5.2 / GPT-5.3, high reasoning), ChatGPT for the brief

Result:

Start: Tuesday afternoon
Done: Wednesday evening
Code volume: 25,000 lines
Feature coverage: almost all epics including UI

Process by stages:

1. The brief was iteratively refined in ChatGPT (~10 iterations of editing the initial message)

2. The agent (Codex) received the brief + the Memory Bank Bible structure → generated epics and features

3. Each feature was implemented in a separate session (20–50 minutes per feature)

4. Tests via the CLI interface, screenshots as proof of scenario completion

5. UI references from Figma/Ticha for the visual part

Main takeaway: without an orchestrator, it’s a hassle. Each feature has to be launched manually. The next step is an autonomous orchestrator that runs all features overnight without developer involvement.

FAQ

What is end-to-end agentic development?

End-to-end agentic development is an approach in which an AI agent (Claude, Codex, GPT) participates in all stages: from clarifying requirements and writing specifications to code implementation, testing, and deployment. The developer sets the direction and controls the results without performing routine operations manually.

How does Memory Bank differ from a regular README?

Memory Bank is a living, structured knowledge base with cross-links, document templates, and organizational principles (C4 model, progressive disclosure). It is updated after every feature. README is a static document for humans. The agent reads Memory Bank at the start of each session to restore project context.

Why use a CLI interface if there is Playwright?

Playwright under agent control is slow and unreliable — the agent gets lost in interfaces, cannot scroll, and misses elements. The CLI interface tests the same client logic as the UI, but deterministically and quickly. A shared client module ensures that the CLI and UI use the same contracts.

How can you check that the agent did not cut corners in implementation?

Verification against the original specification is a separate mandatory stage. According to experiments with Claude 4 Opus and Sonnet, up to 15% of the original plan may not be fully implemented. After implementation, explicitly ask the agent: “compare the result with the feature plan and specify what was not implemented.”

Is staging needed for small projects?

As soon as the project has at least one external user, staging becomes mandatory. An agent with broad tasks (testing, deployment, database work) can make a mistake and delete data. Isolation of the beta environment is the only reliable protection.

Which model is best for agentic development?

For planning and briefs: GPT-5.2 Pro (if available) or GPT-5.3. For code implementation: GPT-5.3 with high reasoning in Codex. Claude is better for creative tasks and discussions, but requires more “guardrails” to follow the plan precisely.

Summary and checklist

End-to-end agentic development is not magic and not a replacement for a developer. It is a system that allows one person to produce an amount of code that previously required a team.

Key conditions for the system to work:

High-quality brief with iterative refinement
Memory Bank as the project's long-term memory
Vertical slices with BDD tests
CLI interface for reliable testing
Verification at every stage, not just at the end
Isolated beta environment

Agentic flow checklist

[ ] The brief has been refined through 5+ iterations, all gaps closed
[ ] Memory Bank created and warmed up at the start of the session
[ ] Each feature is a separate vertical slice with contracts
[ ] BDD scenarios cover the happy path and all edge cases
[ ] CLI interface implemented on top of the shared client module
[ ] Tests run, screenshots/logs as evidence
[ ] Verification against the original plan — explicit check
[ ] The Beta environment is isolated from production
[ ] After deployment — automatic smoke test

The material was prepared based on practical experience presented during a live stream on the AI Driven Development channel together with Denis (Telegram: Dexd Notes).

End-to-end agentic development: from brief to deployment