AI Dev Day: how big tech measures AI efficiency in development
On March 15, 2026, Yandex held its second AI Dev Day — a meetup about real-world experience implementing AI tools in development workflows. Speakers included representatives from Yandex, Avito, Ozon, T-Bank, Sber, and Yandex Go. Here are the key takeaways.
1. AI productivity at Yandex — Andrey Popov
- 57% of engineers use AI tools (in back-end/front-end/mobile — 60–75%), DAU 36%
- Generated code: 23% in agent mode, 30% including suggestions
- Total savings: ~42,000 hours/month ≈ 2% of total time (employees’ self-assessment — 30%, but that is overstated)
- Goal for 2026: grow to 10% savings
- The focus has shifted from assistants to agent mode: the agent solves the task, and a human joins only when needed — analogous to the “disengagement rate” in autonomous cars
- 90%+ of the infrastructure is covered by MCP servers (35+ stable ones); top use cases: tracker work, search, data work
- Information search: the agent reduces deep research time from 20 minutes to 2 minutes
- Labor market takeaway: professions do not disappear, they merge — an engineer without a narrow specialization already handles tasks from adjacent roles
2. GenAI adoption at Avito — Alexander Lukyanchenko (CTO Architecture & Tech Platform)
- Main insight: accelerating the entire development cycle (def cycle time) is only 4–5% in the best teams; coding itself is only 32% of an engineer’s time
- Fine-tuning open models did not pay off — external SOTA models with context deliver better results
- Main measurement framework: adoption → AI-assisted PRs → cycle time
- Approach: select a small group of teams with 100% adoption, run “agent retrospectives,” and iterate against the benchmark
- SVE benchmark (Avito-specific): ~29% of tasks are solved autonomously
- Agents perform well on automated tests, atomic routine tasks, decomposition, and code review (20–40% of changes based on agent comments vs 65–70% for a human)
3. Code assistants at Ozon — Alexander Lukyanov (ML platform)
- 1100 developers/day use the agent assistant, 25–30% daily
- Switching from continue + DeepSeek to Minimax + OpenCode/Cline caused a sharp jump in adoption
- Code review: ~1500 projects connected, up to 1000 reviews/day
- Models are updated in days, not months — through abstract “scenario routes” without reconfiguration
- External models (Claude, GPT) deliver better results on complex tasks, but are not broadly deployed due to code leakage risks
4. Measuring AI in SDLC — Anna Gromova (T-Bank)
- Framework: DORA + SPACE + DX → a unified “metrics tree” for evaluating code delivery and developer comfort
- AI assistant in the IDE: adoption 50% among IT employees, 70–75% among those who commit to GitLab
- Median merge time reduction of 12%, for “ambassadors” (100% adoption) — by 30% over the year
- Unit test generation increased 4x, the share of test-related requests — 12%
- Key takeaway: AI does not replace process redesign — if there is a bottleneck in CI/CD or code review, AI simply moves it further downstream
5. Yandex Code Assistant — Sergey Buldyaev
- A fork of an open-source agent with key enhancements: seamless authentication, one-click access to up-to-date models, MCP on click, a marketplace of presets (an analogue of “linters for agents”)
- The main challenge was adoption: skepticism was overcome through workshops for 1000+ engineers on real tasks
- YQL agent: the main problem is that models do not know YQL → solved through a validation dataset (not LLM-as-judge) and tool-calling examples in the system prompt
6. AID — AI for Designers at Sber — Maxim Shvedenko
- A multi-agent system: three agents (Support, Reviewer, Generator) built on a single knowledge base for the design system — a closed-loop quality process
- Before: reviewing one screen took 30 min–2 hours, fixing comments took 8 hours, a new screen took 16+ hours
- Generator: BT → formalization → JSON specification → rendering components in Figma/React from the design system
- The reviewer slices the mockup into layers; each check type is a separate agent with compressed context
7. SRE + AI at Yandex Go — Alexander Fisher
- SRE GPT — a multi-agent system for incident analysis: covers almost 100% of 400 incidents/day (previously ~99% were not analyzed at all)
- Savings: 30 min × 400 incidents = ~200 hours/day on postmortems alone
- Root cause identification accuracy: ~40–44% — the global benchmark (Microsoft, Meta, Google)
- Prompts in Russian do not work in SRE: there is no stable terminology → switched to English
- Prerequisites: own cloud, observability platform, service catalog, dependency graph, event audit
General conclusions
All companies agree on several points:
- Adoption is the hardest stage. Technology works, but without training, workshops, and clear security policies, people simply do not start using the tools.
- Agent mode matters more than autocomplete. The real impact comes not from IDE suggestions, but from an agent that independently closes tasks.
- Measurement must be done correctly. Adoption and “amount of generated code” are not business metrics. Cycle time, merge time, and change fail rate matter.
- MCP has become the standard. All teams are building context infrastructure through MCP servers.
- SOTA models outperform fine-tuning. Investing in additional training for open models is not cost-effective — external models with context deliver better results.