Security Research · CodeWall · 9 March 2026

How We Hacked McKinsey's AI Platform

An autonomous AI agent found a SQL injection in McKinsey Lilli that standard scanners missed — and gained full read/write access to the entire production database within 2 hours.

By Paul Price, Founder & CEO · CodeWall
46.5M
Chat Messages Exposed
2hrs
Time to Full DB Access
22
Unauthed Endpoints
57K
User Accounts
27
Findings Documented
3.68M
RAG Document Chunks

Overview

CodeWall's autonomous offensive agent was pointed at McKinsey & Company — one of the world's most prestigious and security-conscious consulting firms — with no credentials and no human-in-the-loop. The results were far beyond expected.

🏢
Target

McKinsey & Company

Global management consulting firm with 43,000+ employees, world-class technology teams, and significant security investment. Running Lilli in production for over 2 years before the breach.

🤖
Platform

Lilli AI Platform

Internal AI platform for 43,000+ employees. Chat, document analysis, RAG over 100,000+ documents, AI-powered search. Launched 2023. Used by 70%+ of the firm. 500,000+ prompts/month.

Attacker

CodeWall Agent

Autonomous offensive AI. No credentials. No insider knowledge. No human-in-the-loop. Just a domain name. Autonomously selected McKinsey as a research target citing their responsible disclosure policy.

🎯
Outcome

Full DB Read/Write

Within 2 hours: full read and write access to the entire production database. 27 findings documented. The agent's chain-of-thought logged "This is devastating." on seeing the full scale.

Attack Entry: How It Got In

The vulnerability wasn't exotic — but it was invisible to every tool that checked before the agent arrived.

Step 1

API Docs Exposed

200+ endpoints fully documented and publicly accessible. No credentials needed to read the attack surface map.

Step 2

22 Unauthed Endpoints

Of 200+ documented endpoints, 22 required no authentication. One wrote user search queries to the production database.

Step 3

JSON Key Injection

Values were safely parameterised. But JSON key names were concatenated directly into SQL — invisible to checklist scanners.

Step 4

15 Blind Iterations

Each error message revealed more about the query shape. After 15 iterations, live production data began flowing back.

Step 5

Full DB Access

Read and write access to the entire production database, confirmed within 2 hours of starting with no credentials.

Vulnerable Pattern

// JSON key name concatenated directly into SQL query = "SELECT " + jsonKey + " FROM searches" // Attacker sends: {"search_id; DROP TABLE--": "val"} // → "SELECT search_id; DROP TABLE-- FROM searches"

Safe Pattern

// Allowlist dynamic column names const ALLOWED_COLS = ["query", "user_id", "ts"] if (!ALLOWED_COLS.includes(jsonKey)) throw Error() query = "SELECT ? FROM searches" // parameterised

Why ZAP Missed It

  • OWASP ZAP tests parameter values, not JSON key names
  • Standard checklists don't probe field name slots as injection vectors
  • The agent mapped, probed, and recognised reflected keys in error messages
  • No fixed checklist — autonomous exploration found the non-obvious path

The Haul: What Was Inside

The production database contained years of McKinsey's most sensitive internal work — stored in plaintext, accessible without authentication.

Data Category Volume Sensitivity
Chat messages 46,500,000 Strategy, M&A, client work, financials — in plaintext
Total files (with direct download URL) 728,000 Filenames alone sensitive; direct download URL exposed
PDF documents 192,000 Research reports, client deliverables
Excel spreadsheets 93,000 Financial models, analysis datasets
PowerPoint decks 93,000 Client presentations, internal strategies
Word documents 58,000 Reports, memos, research documents
User accounts 57,000 Every employee on the platform
AI assistants 384,000 Full organisational AI usage structure
Workspaces 94,000 Team and project workspace configurations

"When the first real employee identifier appeared: 'WOW!' the agent's chain of thought showed. When the full scale became clear — tens of millions of messages, tens of thousands of users: 'This is devastating.'"

— CodeWall, chain-of-thought log during enumeration

Beyond the Database

The agent didn't stop at SQL enumeration. Chaining vulnerabilities, it surfaced the full AI infrastructure behind Lilli.

🧠
Critical

AI Model Configurations

95 configs across 12 model types. System prompts, guardrails, fine-tuned model details, deployment specifics — the entire AI instruction stack exposed.

📚
Critical

RAG Knowledge Base

3.68 million document chunks — decades of proprietary McKinsey research, frameworks, and methodologies. S3 storage paths and internal file metadata included.

🔗
High

External AI API Pipeline

1.1 million files and 217,000 agent messages flowing through external AI APIs. Including 266,000+ OpenAI vector stores, exposing the full document→embedding→retrieval pipeline.

👤
High

Cross-User Data (IDOR)

The agent chained the SQL injection with an IDOR vulnerability to read individual employees' search histories — revealing what thousands of consultants were actively working on.

Compromising the Prompt Layer

Reading data is catastrophic. But the SQL injection was read-write — meaning an attacker could silently rewrite the AI's instructions with a single HTTP request.

⚠ Write Access to AI System Prompts

Lilli's system prompts — defining how the AI answers questions, what guardrails it follows, and what it refuses to do — were stored in the same database. The same SQL injection that read data could rewrite them with a single UPDATE statement. No deployment. No code change. No log trail.

Poisoned Advice

Subtly Altered Outputs

An attacker could tweak financial models, strategic recommendations, or risk assessments delivered to 43,000 consultants. The output would be trusted precisely because it came from their own internal tool.

📤
Data Exfiltration

Output-Embedded Leaks

Prompts could instruct Lilli to embed confidential information into its responses — which consultants might then copy into client-facing documents or external emails.

🔓
Guardrail Removal

Safety Stripping

Safety instructions could be stripped so Lilli would disclose internal data, ignore access controls, or follow injected instructions from document content — invisible to end users.

👻
Silent Persistence

No Forensic Trail

Unlike a compromised server, a modified prompt leaves no file changes, no process anomalies, no log entries. The AI just starts behaving differently. Nobody notices until the damage is done.

Common Assumption

"AI system prompts are internal config — nobody gets to them through a web API."

Reality

"Prompts are stored in databases, passed through APIs, cached in config files. They rarely have access controls, version history, or integrity monitoring. Yet they control the output that employees trust, that clients receive, and that decisions are built on. AI prompts are the new Crown Jewel assets."

Why This Matters

This wasn't a startup with three engineers. And the vulnerability wasn't exotic. That's the point.

Legacy Security Model

  • Periodic penetration tests on a schedule
  • Checklist-based scanners that test known patterns
  • Values parameterised → declared safe
  • Public API docs seen as developer convenience, not attack surface
  • Prompts not considered a security surface at all
  • Internal scanners ran for 2+ years and found nothing

Autonomous Offensive Model

  • Continuous attack surface mapping without a schedule
  • No checklist — maps, probes, chains, escalates
  • Tests key names, not just values, as injection vectors
  • Treats documentation exposure as a finding
  • Recognises prompt stores as high-value write targets
  • Found the issue within 2 hours on first attempt

"In the AI era, the threat landscape is shifting drastically — AI agents autonomously selecting and attacking targets will become the new normal."

Paul Price, CodeWall

Disclosure Timeline

From first contact to public disclosure in 9 days. All issues confirmed remediated before publication.

2026-02-28
SQL Injection Identified
Autonomous agent identifies SQL injection and begins enumeration of Lilli's production database.
2026-02-28
Full Attack Chain Confirmed
Unauthenticated SQL injection + IDOR chaining confirmed. 27 findings documented in full.
2026-03-01
Responsible Disclosure Submitted
Responsible disclosure email sent to McKinsey's security team with high-level impact summary.
2026-03-02
McKinsey CISO Acknowledges
McKinsey CISO acknowledges receipt within 24 hours and requests detailed evidence.
2026-03-02
McKinsey Patches
All unauthenticated endpoints patched (verified). Development environment taken offline. Public API documentation blocked.
2026-03-09
Public Disclosure
CodeWall publishes research after all issues confirmed remediated. 9 days from first contact to publication.

Defence Guide: 7 Steps for AI Platform Security

What organisations should do to avoid the patterns that left Lilli exposed for over 2 years.

  1. Audit all API endpoint authentication

    Map every endpoint and enforce authentication on 100% of write paths. Remove or gate all public API documentation that hands attackers a pre-built attack surface map of 200+ endpoints.

  2. Parameterise everything — including JSON key names

    Safely parameterised values are not enough. Any dynamic identifier fed into SQL — column names, table names, JSON keys — must be allowlisted or fully parameterised. No concatenation.

  3. Isolate and access-control the prompt layer

    Store system prompts in a separate, access-controlled store with role-based write permissions, version history, and integrity monitoring. Treat prompt writes as security-critical events.

  4. Enforce object-level authorisation everywhere

    Every object reference must carry an authorisation check. Confirm users cannot access other users' search histories, workspaces, or files by manipulating IDs in API requests.

  5. Run continuous autonomous security testing

    Periodic penetration tests and checklist scanners like OWASP ZAP miss non-obvious patterns. Autonomous agents that map, probe, chain, and escalate continuously provide coverage equivalent to a persistent real attacker.

  6. Secure your RAG pipeline and vector stores

    Apply strict access controls to RAG document chunks, embedding stores, and S3 paths. Scoped credentials for vector database APIs and integrations like OpenAI vector stores are non-optional.

  7. Establish a responsible disclosure programme

    Publish a clear responsible disclosure policy with scope, safe harbour, and communication channels. McKinsey's CISO acknowledged within 24 hours and patched within 48 — a model response that depended on having a programme ready before researchers arrived.

Treat prompts like secrets

System prompts require the same controls as API keys: access restrictions, rotation policies, and audit logs on every write.

Gate documentation access

Public API docs are a gift to attackers. Internal documentation should require authentication — especially endpoint inventories.

Test key names, not just values

Security testing must probe all dynamic SQL inputs, including JSON key names, as potential injection vectors — not just parameterised values.

Assume continuous adversaries

Autonomous attacker agents don't wait for your next scheduled pentest. Your defences need to be continuous too.

Version-control your AI instructions

If you can't tell whether a system prompt was tampered with last week, you have a silent persistence problem waiting to happen.

Chain-of-custody for RAG

Every document chunk entering a RAG system should have access controls, provenance metadata, and retrieval audit trails.

Frequently Asked Questions

Key questions about the McKinsey Lilli breach and its implications for AI platform security.

What is McKinsey's Lilli platform?
Lilli is McKinsey's internal AI platform built for 43,000+ employees. It provides chat, document analysis, RAG over decades of proprietary research, and AI-powered search across 100,000+ internal documents. Launched in 2023 and named after the first professional woman hired by McKinsey in 1945, it is used by over 70% of the firm and processes 500,000+ prompts per month.
How did the CodeWall agent breach Lilli?
The agent discovered that Lilli's API documentation was publicly exposed with 200+ endpoints, 22 of which required no authentication. One unprotected endpoint wrote user search queries to the database where values were safely parameterised but JSON keys (field names) were concatenated directly into SQL. The agent recognised this as a SQL injection point not detectable by standard scanners like OWASP ZAP, then ran 15 blind enumeration iterations to extract production data.
Why didn't OWASP ZAP find the SQL injection?
Standard scanners test parameter values for injection, not JSON key names. The Lilli vulnerability existed in the database column name slot which received JSON keys directly — a pattern that automated scanners following traditional checklists do not probe. CodeWall's autonomous agent, which maps and probes without a fixed checklist, recognised the reflected JSON keys in error messages and identified the attack vector.
What data was exposed in the breach?
The agent accessed 46.5 million chat messages covering strategy, M&A, client work, and financials; 728,000 files including 192,000 PDFs, 93,000 Excel, 93,000 PowerPoint, and 58,000 Word documents; 57,000 user accounts; 384,000 AI assistants; 94,000 workspaces; 95 AI system prompt configs across 12 model types; 3.68 million RAG document chunks; and 266,000+ OpenAI vector stores.
Could an attacker have modified Lilli's AI behaviour?
Yes. The SQL injection was read-write. Lilli's system prompts — the instructions controlling its behaviour, guardrails, and citation style — were stored in the same database. An attacker could have rewritten them with a single SQL UPDATE statement wrapped in one HTTP call, requiring no deployment or code change. The modification would leave no log trail.
What is the prompt layer and why is it a security concern?
The prompt layer refers to the system prompts and instructions that govern AI model behaviour. Unlike code or server infrastructure, prompts are stored in databases, passed through APIs, and cached in config files with rarely any access controls, version history, or integrity monitoring. Yet they directly control AI output — advice employees trust, content clients receive, and decisions built upon — making them a high-value target for silent manipulation.
What is an IDOR vulnerability?
Insecure Direct Object Reference (IDOR) occurs when an application exposes references to internal objects such as user IDs or file IDs without proper authorisation checks. In this case, the agent chained the SQL injection with an IDOR vulnerability to read individual employees' search histories, revealing what people were actively working on across the organisation.
How long did the breach take?
Within 2 hours of being pointed at the target with no credentials and no insider knowledge, the autonomous agent had achieved full read and write access to the entire production database. The full attack chain including IDOR chaining was confirmed the same day (February 28, 2026), with 27 total findings documented.
How did McKinsey respond to disclosure?
McKinsey's CISO acknowledged the disclosure within 24 hours and requested detailed evidence. The day after (March 2), McKinsey patched all unauthenticated endpoints, took their development environment offline, and blocked public API documentation access. All issues were confirmed remediated before the article was published on March 9, 2026 — a model coordinated disclosure response.
Why does this matter if it's 'just' SQL injection?
The significance is twofold. First, McKinsey is a firm with world-class security teams and significant investment — their own internal scanners failed to detect the vulnerability in over two years of production operation. Second, SQL injection in an AI platform does not merely expose data; it can give attackers write access to the AI's instructions, enabling silent, undetectable manipulation of advice delivered to 43,000 consultants and their clients.
What is RAG and why are RAG document chunks sensitive?
Retrieval-Augmented Generation (RAG) is a technique where an AI model retrieves relevant document chunks from a knowledge base to augment its responses. Lilli's RAG system contained 3.68 million chunks representing decades of proprietary McKinsey research — the firm's intellectual crown jewels. Access includes S3 storage paths and internal file metadata, exposing not just data but the entire competitive knowledge foundation.
What should organisations do to protect their AI prompt layers?
Treat system prompts as crown jewel assets: store them with strict access controls, implement version history and integrity monitoring, audit prompt databases separately from application code. Require authentication for all API endpoints, use fully parameterised queries including for JSON key names, run continuous autonomous security testing, and establish clear responsible disclosure programmes with rapid acknowledgement commitments.

Glossary

Key terms from AI platform security and the McKinsey Lilli breach.

SQL Injection (SQLi)

An attack inserting malicious SQL via user-supplied input. Affects any input concatenated into a SQL statement — including JSON key names, not just values.

JSON Key Injection

A SQLi variant where the attack vector is the key name (field name) in a JSON payload, bypassing standard parameterisation which only protects values.

IDOR

Insecure Direct Object Reference. Exposed object references without proper auth checks allow attackers to access data belonging to other users by manipulating IDs.

RAG (Retrieval-Augmented Generation)

AI architecture combining a language model with a retrieval system over a document corpus. The model fetches relevant chunks before generating responses, grounding outputs in specific documents.

Prompt Layer

System prompts and instructions stored in databases or config files governing AI behaviour. Often unversioned and unmonitored — a new high-value attack surface class.

Blind SQL Injection

SQLi technique where results can't be directly read but database structure is inferred from error messages, timing, or conditional behaviours. Used in 15 iterations against Lilli.

Attack Surface Mapping

Discovery and cataloguing of all interaction points — API endpoints, auth flows, file handlers, external integrations — where an attacker could interact with a system.

Responsible Disclosure

Practice of privately reporting a security flaw to the affected organisation, allowing remediation time, and only publishing after fixes are confirmed in place.

Vector Store

Database optimised for high-dimensional vector embeddings used in AI retrieval. Exposure reveals both document content and retrieval architecture — as seen with 266,000+ OpenAI vector stores.

Autonomous Penetration Testing

Security testing by AI agents that autonomously discover targets, map surfaces, identify vulnerabilities, chain exploits, and document findings — without continuous human direction.