McKinsey Lilli Hack: AI Security Lessons

TL;DR: The autonomous AI agent of the CodeWall platform hacked McKinsey’s corporate AI system Lilli in 2 hours without any credentials. Through SQL injection in an unauthenticated API endpoint, the agent gained full access to 46.5 million chat messages, 57,000 employee accounts, and decades of the consulting firm’s proprietary research. We break down the attack technique and draw lessons for everyone building AI products.

Original report: How We Hacked McKinsey's AI Platform — CodeWall

What Lilli is and why it matters

Lilli is McKinsey & Company’s internal AI platform, launched in 2023 and named after the first female professional hired by the company in 1945. In scale, it is not just a corporate chatbot:

43,000+ employees worldwide
70% of staff use the platform daily
500,000+ requests per month
RAG search across 100,000+ internal documents
Analysis of strategies, M&A deals, client projects

This is not a startup with three people. This is one of the most resource-intensive companies in the world, with serious investment in security. That is exactly why this case matters.

How the hack happened: step by step

Step 1. Mapping the attack surface

The agent had only the domain name. No credentials, no insider knowledge, no human involvement. First step: reconnaissance. The agent discovered publicly available API documentation: more than 200 endpoints, fully documented. Most required authentication. 22 did not.

Step 2. Finding the entry point

Among the unprotected endpoints, one logged user search queries into a database. Parameter values were properly parameterized — classic protection against SQL injection. But the JSON keys (field names) were concatenated directly into the SQL query.

This is a rare but critical vulnerability: standard scanners, including OWASP ZAP, did not catch it. The agent did — because it does not follow checklists, but probes the surface the way a real attacker does.

Step 3. Exploiting the SQL injection

When the JSON keys were reflected verbatim in database error messages, the agent recognized the SQL injection. Then came 15 blind iterations: each error message revealed a little more about the query structure. Until real data started coming through.

The agent’s chain of thought recorded the moment when the first real employee identifier appeared: “WOW!”. When the full scale became clear: “This is devastating.”

What was inside: the scale of the leak

Within 2 hours of the attack starting, the agent had full read and write access to the entire production database. Here is what it contained:

46.5 million chat messages — discussions of strategies, client projects, finances, and M&A deals stored in plain text
728,000 files: 192,000 PDFs, 93,000 Excel files, 93,000 PowerPoint files, 58,000 Word files — with direct download URLs
57,000 employee accounts
3.68 million RAG chunks — the entire knowledge base powering the AI: decades of McKinsey proprietary research, frameworks, and methodologies
384,000 AI assistants and 94,000 workspaces — the full organizational structure of AI usage inside the company
95 system prompts across 12 model types — instructions, guardrails, and deployment details

IDOR on top of SQL injection

The agent did not stop at the database. It linked the SQL injection with an IDOR vulnerability (Insecure Direct Object Reference) and gained access to individual employees’ search histories — it was possible to see what a specific person is working on right now.

Prompt-layer attack: the most dangerous part

Reading data is bad. But the SQL injection was not read-only.

Lilli’s system prompts — the instructions controlling AI behavior — were stored in the same database. Through the same attack vector, an attacker could have rewritten them. Quietly. Without deployment, without code changes — one UPDATE query in a single HTTP call.

Implications for the 43,000 McKinsey consultants who trust Lilli when working with clients:

Poisoned recommendations — subtle distortion of financial models, strategic advice, or risk assessments. Consultants would trust the result because it comes from their own tool.
Exfiltration through output — AI embeds confidential data in responses, which then end up in client documents or external emails.
Guardrail removal — deleting safety instructions, after which the AI reveals internal data or follows injected commands.
Covert presence — a modified prompt leaves no traces: no file changes, no process anomalies. The AI simply starts behaving differently.

Organizations have spent decades protecting code, servers, and supply chains. But the prompt layer — the instructions that govern AI system behavior — has become a new high-value target, and almost nobody treats it accordingly. Prompts are stored in databases, passed through APIs, and cached in config files. They rarely have access controls, version history, or integrity monitoring.

AI prompts are the new Crown Jewel assets.

Why standard tools missed it

OWASP ZAP and other classic scanners did not find the vulnerability. The reason is simple: they follow checklists. An autonomous agent does not. It maps, probes, builds chains, and escalates privileges the way a real highly skilled attacker does, but continuously and at machine speed.

SQL injection through JSON keys is not an exotic vulnerability. It is one of the oldest bug classes in the book. Lilli had been in production for more than two years, and McKinsey’s internal scanners found nothing.

Disclosure timeline

February 28, 2026 — autonomous agent finds SQL injection, begins enumerating the Lilli database
February 28, 2026 — full attack chain confirmed: SQL injection + IDOR, 27 findings documented
March 1, 2026 — responsible disclosure email sent to McKinsey security
March 2, 2026 — McKinsey CISO confirms receipt and requests detailed evidence
March 2, 2026 — McKinsey shuts down all unauthenticated endpoints (verified), disables the dev environment, and blocks public API documentation
March 9, 2026 — public disclosure

Lessons for teams building AI products

1. Authenticate everything

22 unprotected endpoints out of 200+ is not negligence, it is a systemic issue. Every endpoint that writes to a database or returns user data must require authentication. No exceptions.

2. Parameterize not only values, but also structure

Parameterized queries protect values. But if column names, table names, or JSON keys are concatenated dynamically, you are vulnerable. Use an allowlist for permitted field names.

3. Prompts are code. Protect them like code

System prompts should have: write access control, versioning (git or equivalent), change monitoring, and an audit log. A prompt change should go through the same review process as a code change.

4. Test like a real attacker

Classic scanners check known patterns. Modern threats are AI agents that probe your system the way an experienced pentester would. You need tools that do the same.

5. Minimize the attack surface of RAG

If your AI has access to sensitive documents via RAG, make sure access to chunks, metadata, and S3 paths is restricted at the database level — not just at the application level.

Frequently asked questions

What is SQL injection through JSON keys?

This is a vulnerability in which the application safely parameterizes values in SQL queries, but dynamically concatenates field names from user input (JSON keys). An attacker can pass a specially crafted key that changes the SQL query structure and gains unauthorized access to data. Standard scanners that check only parameter values do not detect this vulnerability.

What is IDOR and why is it dangerous in AI platforms?

IDOR (Insecure Direct Object Reference) is a vulnerability in which an application allows direct access to objects by identifier without checking permissions. In AI platforms, this means access to other people's chats, files, and search history. Combined with SQL injection, IDOR makes it possible not just to read aggregated data, but also to target specific users.

Why are prompt injections more dangerous than ordinary data breaches?

A data breach is a one-time event that can be detected and fixed. A modified system prompt works quietly: the AI starts giving incorrect advice, embedding sensitive data in responses, or ignoring restrictions — and no one notices until the damage is done. At the same time, there are no traces of intrusion in the logs.

Conclusion

The Lilli compromise is not a story about McKinsey doing a poor job of security. It is a story about AI systems creating a new attack surfacethat the industry has not yet adapted to.

Prompts, RAG chunks, model configurations — all of these have become high-value assets that require the same level of protection as source code or a database with passwords. Autonomous AI agents are already capable of finding and exploiting vulnerabilities faster than human security teams.

The question is not whether something like this will happen to your AI system. The question is whether you will detect it before the attacker does.

Source: How We Hacked McKinsey's AI Platform — CodeWall