Meta’s Incognito Chat is interesting not because it promises privacy, but because it makes privacy expensive. It limits what Meta can know, what Meta can monetize, and what Meta can hand over later. That is what makes the announcement worth taking seriously.
Meta has launched Incognito Chat with Meta AI, a way to talk to an artificial intelligence assistant on WhatsApp and the Meta AI app under privacy guarantees that, as Mark Zuckerberg puts it, are “similar to how end-to-end encryption means no one can read your conversations, even Meta or WhatsApp.” The system runs server-side AI inference—meaning the AI processes requests on remote servers rather than on your device—inside a hardware-isolated environment known as a trusted execution environment (TEE). In principle, that means even Meta cannot inspect users’ chats. Meta pairs that architecture with stateless processing, anonymous routing, and at least partially verifiable transparency measures.
Two design tradeoffs deserve attention because, in my view, Meta made the right call on both. First, the architecture makes personalized advertising harder. Second, it constrains trust-and-safety systems that depend on human review of flagged conversations. Both choices will draw criticism—from inside the company and from outside advocates of broader monitoring powers. Both are worth defending before the inevitable pressure campaign begins.
I have argued previously that hardware-backed confidential computing offers the most credible path to combining frontier-grade AI capabilities with meaningful data security. I have also argued that voluntary technical self-restraint by AI providers strengthens the case for legal protections for AI conversations. Incognito Chat is the first major consumer AI product that, at least on its face, appears to satisfy both halves of that argument.
My initial reaction, then, is simple: more of this, please—and from every provider.
Privacy by Hardware, Not by Pinky Promise
The privacy claim matters only if you understand, at least roughly, how the system works. End-to-end encryption protects a message while it travels between two endpoints. But the computer running the AI model is itself an endpoint. To generate a response, it has to see the message in plaintext. Without additional engineering, “AI in WhatsApp” would simply mean your messages arrive on Meta’s servers in readable form.
The trick is to make the AI endpoint itself something Meta cannot inspect.
That is the role of the TEE. In the architecture Meta describes in its Private Processing white paper, the AI workload runs inside an AMD confidential virtual machine, paired with NVIDIA confidential-computing graphics processing units (GPUs). The hardware encrypts the virtual machine’s memory and prevents inspection by the host operating system, the hypervisor (the software layer that manages virtual machines), system administrators, or Meta employees.
The key guarantees include:
- The confidential virtual machine uses attested code. Before a user’s device sends data, it cryptographically verifies that the server is running the specific software image Meta publicly committed to—and nothing else.
- Processing is stateless. The system decrypts a request, processes it, and discards it. No persistent log or long-term key-value cache of conversations remains.
- Routing is oblivious. Requests pass through a third-party relay operated by Fastly, which hides the user’s IP address from Meta. Anonymous credentials further prevent Meta from linking requests to identified users.
- Critical software artifacts are committed to an append-only transparency log hosted by Cloudflare. In plain English: Meta cannot silently swap in different code without leaving a public trace.
None of this is perfect. Meta acknowledges, for example, that NVLink connections between GPUs are not yet encrypted on the deployed hardware and that GPU memory itself remains unencrypted. AMD also does not guarantee protection against certain side-channel attacks—attacks that infer information indirectly through timing, power use, or similar signals—within its threat model. Several of Meta’s verifiable-transparency commitments, particularly third-party researcher access to auditable artifacts, also remain only partially fulfilled.
I flagged that transparency gap in my earlier piece, and I am flagging it again here.
Still, the architecture is fundamentally right. More importantly, the remaining risks are the right kinds of risks: vulnerabilities that would require sophisticated, large-scale attacks, rather than a routine subpoena or an employee with access to the right internal database.
For users, the practical upshot is straightforward: no one—not even Meta—should be able to read these conversations. At least for now, Incognito Chat is text-only, conversations are not saved by default, and messages disappear unless users choose otherwise.
Confidentiality That Survives a Subpoena
Users ask ChatGPT, Claude, and Meta AI about their lab results, medications, mental health, relationships, finances, and potential legal exposure. Treating that as merely “the user’s responsibility” misses the point.
First, this is exactly the sort of information least suited to leakage. Much of it is inherently sensitive. Much of it would also qualify under the European Union’s General Data Protection Regulation (GDPR), as a “special category” of personal data under Article 9, which triggers heightened legal protections. And a sufficiently large breach—or a sufficiently aggressive data-preservation order, like the one issued against OpenAI in The New York Times litigation—could expose millions of intimate, decontextualized fragments of conversations between people and AI systems.
Second, the costs of weak privacy are real. A person who does not feel safe asking an AI assistant about a troubling symptom, a possible medication interaction, or a personal crisis may simply stay silent. That forecloses a category of benefits—better access to information, better opportunities for expression—that European fundamental-rights doctrine treats as carrying substantial weight.
This connects directly to the argument I previously made for something like “AI privilege.” OpenAI CEO Sam Altman has discussed extending professional-secrecy-style protections to conversations with AI, and I am broadly sympathetic to that direction. But the strongest version of that argument requires a credible technical foundation.
Providers cannot merely promise discretion. They need to show they could not betray user confidences even if they wanted to—or even if governments compelled them to try.
Meta’s Incognito Chat is the first meaningful demonstration of that proposition in a mass-market messaging context. It establishes a technical baseline that legal protections can plausibly map onto. Just as importantly, it creates a benchmark that regulators and courts can reasonably expect privacy-by-design AI deployments to approach, rather than dismiss as exotic or impractical.
If we want courts to treat AI conversations as something closer to private reflection than ordinary cloud storage, the industry has to make those conversations look, technically, more like private reflection.
This is what that looks like.
The Tradeoffs Are the Point
Two tradeoffs here deserve special attention because they impose real costs on Meta. I expect both to generate pushback—from inside the company, from regulators, and from the broader trust-and-safety ecosystem.
Even so, both are reasons to give Meta credit.
First, a genuinely confidential AI architecture constrains behavioral advertising. If Meta cannot read users’ conversations, it also cannot freely mine those conversations for ad targeting or model training. That limitation cuts directly against the incentives of the modern ad-tech stack.
Second, meaningful confidentiality limits moderation systems that depend on human review of flagged conversations. Once content leaves the TEE for inspection, the “no one—not even Meta—can read your conversations” promise starts to weaken in ways that are hard to cabin and harder to explain.
Those are not incidental side effects. They are the architecture working as intended.
The Ad-Tech People Are Not Going to Love This
Inference inside a TEE, combined with stateless processing and no exfiltration channel, means Meta cannot read what users ask Meta AI. It also means Meta cannot use those conversations to train ad-targeting systems or feed their contents into real-time advertising auctions.
For a company whose business still depends heavily on behavioral advertising, that is not a minor concession. It is exactly the sort of concession that, in my experience, tends to generate substantial internal friction long before a product ever ships.
At the same time, this is not—and should not be—a permanent ban on monetization.
One could imagine architectures that allow ads to flow into the TEE while tightly constraining what flows back out. For example, ad creatives could enter the confidential environment and match against conversation content there, while only deterministic, predefined, aggregated, and otherwise constrained signals—say, bucketed impression counts—leave the system. The attested code would need to enforce those limits, publish them to the transparency log, and ideally combine them with differential-privacy-style noise. That kind of noise makes individual users harder to identify in aggregate data and helps prevent the output channel from becoming a covert way to smuggle conversation data back out.
There are real design challenges here. Click-through measurement, for example, gets complicated once a user action leaves the TEE boundary. There is also plenty of room for the adtech industry to overreach, as it reliably does when presented with a new data source.
Still, Meta has adopted the right starting position: no targeting based on conversation content unless and until the company builds a privacy-preserving mechanism that is technically constrained, cryptographically attested, and transparently auditable.
That is effectively the inverse of the familiar real-time-bidding (RTB) model, in which companies broadcast user data first and draft rules about its use later.
The Safety Team Hits a Cryptographic Boundary
Meta’s framing in the Reuters briefing is notable for what it does not say. A company representative told reporters that “the AI will also have built-in safety guardrails, refusing to answer problematic questions or steering conversations in different directions.” Read carefully, that describes model-level guardrails—refusals trained or prompted into the model operating inside the TEE—and little else.
There is no mention of human review. No mention of flagging “problematic” conversations for downstream moderation. No mention of a separate classifier system reporting user prompts outside the confidential environment.
That omission matters.
Meta’s white paper emphasizes a principle of “non-observability.” Under that design, classification can occur inside the TEE, but even the size or timing of traffic between an internal classifier and the orchestration system should not reveal anything about the classifier’s output to the outside world.
To see why this is significant, compare Incognito Chat with the standard architecture used by major web-based AI chatbots. In a typical ChatGPT, Claude, or Gemini deployment, user prompts are processed in cleartext on the provider’s servers, alongside a trust-and-safety stack that usually includes:
- classifier models that score prompts and outputs for categories such as self-harm, child sexual-abuse material, weapons, or fraud;
- rule-based denials and topic blocklists;
- refusal behaviors trained through reinforcement learning from human feedback (RLHF);
- and, critically, escalation paths that route some conversations to human reviewers and, in narrow circumstances, to law enforcement or other authorities.
OpenAI recently made that structure explicit when discussing planned client-side encryption features. The company described “fully automated systems to detect safety issues,” with human review reserved for “serious misuse and critical risks—such as threats to someone’s life, plans to harm others, or cybersecurity threats.”
That escalation layer is precisely what Meta does not appear to replicate inside Incognito Chat. And for good reason. The moment flagged content leaves the TEE for outside review, the no-access promise starts to unravel.
I think Meta made the right call here.
Model-level guardrails are imperfect. Motivated bad actors can sometimes circumvent them, just as they can circumvent classifier systems. But the virtue of the model-level approach is its conceptual honesty: a refusal generated inside the TEE creates nothing that exits the TEE.
Once a flagging-and-review mechanism gets bolted onto this architecture, users immediately face two questions they cannot answer for themselves: What kinds of content trigger a flag? And what happens to flagged content after it crosses the boundary of the confidential environment?
Whatever enters a human-review queue necessarily exists in cleartext outside the TEE. At that point, it falls back into precisely the threat model the TEE was designed to neutralize: subpoenas, broad preservation orders, compelled disclosure, external breaches, and insider misuse.
The promise quietly shifts from “no one—not even Meta—can read your conversations” to something much weaker: “no one can read most of your conversations, although we cannot tell you in advance which ones we might read.”
If that position eventually becomes politically or regulatorily untenable—say, because regulators conclude that model-level safety alone is insufficient for a consumer product operating at WhatsApp’s scale—there is a possible fallback. It is also, in my view, a substantially worse design.
Meta could run a second large language model (LLM) safety classifier inside the TEE that monitors conversations and exfiltrates only narrowly defined categories of flagged content for human review. Whether such a system could remain meaningfully attestable is itself debatable. Even assuming it could, two major problems would remain.
First, motivated actors would still try to game the classifier, just as they already game model-level guardrails. Red-team exercises demonstrate this constantly.
Second, using an LLM as the gatekeeper makes the user-uncertainty problem nearly impossible to solve. There is no stable rulebook Meta could realistically publish because the classifier itself is probabilistic. Its outputs can shift based on wording, surrounding context, or silent model updates.
An ordinary user thinking out loud about a worrying symptom or a medication interaction would have no reliable way to know, ex ante, whether the system might classify the conversation as “self-harm adjacent” or “drug-related” and route it for review. Nor would the user know whether those messages could wind up in a queue subject to discovery demands, data breaches, or insider access.
At that point, the privacy promise effectively collapses into: “Trust us to behave reasonably about what we exfiltrate.”
Refreshingly, Incognito Chat appears to be trying very hard not to make that promise.
The Privacy Baseline Just Moved
The first thing I hope is that Meta closes the remaining verifiable-transparency gaps quickly. That means publishing the binaries corresponding to entries in the Cloudflare transparency log, broadening third-party researcher access to auditable artifacts, and shortening coordinated-disclosure timelines where possible. The white paper says most of the right things. Now the product needs to finish implementing them.
Second, I hope the other major AI providers—OpenAI, Anthropic, Google, and the cloud platforms hosting them—converge on some version of this architecture. OpenAI’s recent client-side-encryption announcement is a promising signal. But as I argued at the time, the implementation details will determine whether the result amounts to genuinely comparable confidentiality or something softer. The key questions are where safety detection occurs—on-device, inside a TEE, or somewhere else entirely—and how transparently the system exposes those choices.
Anthropic, meanwhile, has a particularly strong incentive to move in this direction. A company that has built much of its public identity around AI safety is well-positioned to show that “safety” and “the provider can read every conversation” are not synonymous.
Third, I hope EU policymakers treat Incognito Chat as a benchmark rather than a regulatory inconvenience. There is a real risk that TEE-based systems will strike some regulators as frustrating precisely because they limit visibility and control. I have written previously (here and here) about the tendency to treat strong privacy architectures as obstacles rather than achievements.
That gets the analysis backward. This is what a fundamental-rights-respecting AI architecture actually looks like. Policymakers should encourage other providers to match it—and in some contexts, perhaps require them to.
Finally, and most speculatively, I hope we begin developing a serious legal vocabulary for AI privilege that takes technical architecture seriously.
Legal privilege has always depended on a combination of professional obligation and practical limitation. A lawyer cannot disclose information the lawyer never possessed. Confidential computing creates an analogous structure for AI systems: a provider commitment to privacy, a technical inability to access user conversations, and an external audit trail through verifiable transparency mechanisms.
If society wants AI systems deployed at scale for genuinely sensitive use cases—health, mental health, legal advice, financial guidance—then the law arguably should reward providers that adopt architectures like this, rather than punish them for the resulting blind spots.
The alternative is an AI ecosystem built around the assumption that every intimate conversation must remain readable by someone, somewhere, just in case. That is not a safety model. It is a surveillance model.
