Home EconomyTurning Down the Thinking: A Law & Economics Trilogue on AI Throttling

Turning Down the Thinking: A Law & Economics Trilogue on AI Throttling

by Staff Reporter
0 comments

Three section leads at the International Center for Law & Economics (ICLE) read the same viral GitHub post and reached three different conclusions. Call it a trilogue—three views, one problem, and a technology that refuses to sit still.

The GitHub issue filed last week against Anthropic’s Claude Code product carried a blunt title: “Claude Code is unusable for complex engineering tasks with the Feb updates.” The author—Stella Laurenzo, an AMD senior AI director—laid out a detailed account of technical decline.

According to Laurenzo, months of session-log data show that from January to March, median “thinking” output fell roughly 70%. The model began bailing out or asking permission to continue about 10 times per day—up from zero before early March. Self-contradictions in its reasoning tripled. API requests spiked, suggesting users had to retry repeatedly to get usable results.

Most striking, performance appeared to degrade during peak GPU-load hours and recover late at night. That pattern offers circumstantial—but suggestive—evidence that quality was being throttled as a function of server demand, rather than any deliberate design improvement.

The issue went viral. Within about 20 minutes of reading it, three of us found ourselves in a lively disagreement about how to understand it through a law & economics lens.

Same Model, Less Thinking

To see why this matters, start with the commercial arrangement. Users subscribe to Claude’s premium tiers—marketed as “Pro” and “Max”—and pay substantial monthly fees, up to $200, for access to the most capable model. The value proposition is simple: you are paying for the system’s best reasoning. Product pages highlight superior performance on complex tasks, extended “thinking” capabilities, and the ability to handle professional-grade engineering and analytical work.

The AMD engineer’s log analysis suggests that, sometime after January, Anthropic quietly reduced the computational resources allocated per query. The model did not become “dumber” in the sense that its weights changed. Instead, it appears to have had less time—and fewer resources—to think through each problem.

A rough analogy: you hire a brilliant consultant to service a client, then limit them to 30 seconds per question instead of an hour—without telling the client the terms have changed.

That possibility raises a cluster of legal and economic questions. Reasonable people, as it turns out, disagree about them quite sharply.

When Optimization Looks Like Deception

The first view among us—call it the consumer-protection hawk position, advanced by Eric Fruits—is that this could present a straightforward deception case under Section 5 of the Federal Trade Commission Act (FTC Act) and its state-law analogs.

The argument runs like this: Anthropic marketed a product with defined capabilities. Users subscribed based on those representations. The company then degraded the product without disclosure. Whether the change was operationally justified or economically rational does not matter. What matters is the gap between what Anthropic promised and what it delivered.

On this view, the log data looks like a smoking gun. If median thinking depth fell 70% and retry rates spiked, then the March product differed materially from what users bought in January—and, from a legal standpoint, no one told them. The Federal Trade Commission’s (FTC) deception standard asks whether a representation or omission is likely to mislead a reasonable consumer under the circumstances. A reasonable consumer paying premium prices for an AI reasoning engine would expect roughly consistent performance—or, at minimum, notice if that performance changed. Ideally, users would also understand, ex ante, how reasoning capability might vary across tasks.

Put differently: Would a user subscribe if they knew reasoning would be throttled when demand peaked?

This position also draws support from European consumer-protection law, which may offer an even more hospitable framework. The European Union’s (EU) Unfair Commercial Practices Directive and Digital Content Directive impose affirmative obligations on providers of digital services to maintain service quality as it existed at the time of contract, absent explicit agreement otherwise. Under that standard, a measurable decline in quality can itself constitute a breach, even without an affirmative misrepresentation.

There is also an economic-waste argument. If degraded outputs force users to re-query to get adequate results, the effective cost of the service rises, even if the subscription price does not. The AMD engineer’s data showed API requests increasing by a factor of 80 from February to March. That is not just friction. For developers on metered billing, it is direct financial harm: more tokens, worse results.

Fruits acknowledges, however, that even a strong unfair or deceptive acts or practices (UDAP) theory raises serious error-cost concerns. As Jonathan Barnett has argued, aggressive Section 5 enforcement that discounts false positives can chill legitimate business conduct. In a fast-moving AI market, the risk of locking in rigid resource-allocation practices is not hypothetical. 

And if the remedy is disclosure—requiring firms to tell users how compute gets allocated—the track record is bleak. As Omri Ben-Shahar has shown, mandated disclosure regimes routinely fail to inform consumers, improve decisions, or change firm behavior. A rule requiring AI companies to publish “thinking-token” budgets would likely join that long list of well-intentioned failures.

For Fruits, then, the existence of a plausible UDAP theory may say more about the breadth of the FTC’s authority than about AI firms’ conduct. If the resource-allocation behavior is reasonable (and it likely is), if disclosure does not change consumer behavior, and if enforcement does not change firm behavior, what exactly does the claim accomplish? The ease with which one can frame ordinary business optimization as deception may itself suggest that Section 5’s UDAP prong has grown too capacious for its own good.

When ‘Worse’ Depends on the Question

The second position, from Kristian Stout, concedes that the consumer-protection theory has some bite, but argues it is much harder to prove than Eric Fruits suggests. The problem is “quality.” In the context of a large language model, quality is not fixed or easily measured. It is highly context-dependent.

Take a simple example. Ask Claude who the first president of the United States was. That query requires essentially no extended reasoning. A model running at full capacity and one running at reduced capacity will produce the same answer. For most routine queries, the difference is likely invisible. Degradation shows up only on complex, multi-step reasoning tasks—the very tasks a smaller subset of users (albeit the highest-paying ones) tends to perform.

That distinction matters for the legal analysis. A deception claim requires evidence that actual consumers were misled about something material. It is not enough to show that internal resource allocation changed. You need to show that real users, in real usage patterns, experienced a meaningful decline in output quality. The relevant question is not whether the model could reason at a lower level, but whether—across the distribution of actual queries—it did produce materially worse results.

An analogy helps. If Albert Einstein offers tutoring services and then spends less time preparing for each session, that is actionable only if the tutoring quality declines. If he had been over-preparing for basic calculus sessions, cutting prep time to a still-adequate level is not deception; it is efficiency. The legal question is whether students got Bill Nye when they were promised Einstein, not whether Einstein spent fewer hours in the library.

There is a useful parallel to an earlier debate. When critics argued that internet service providers (ISPs) used data caps to exploit consumers, Geoffrey Manne and Ian Adams responded that usage-based billing is a standard, efficient practice. It aligns costs with usage and prevents light users from subsidizing heavy ones. The same logic carries over to AI compute allocation. 

A flat-rate subscription that delivers maximum “thinking” tokens to every query, regardless of complexity, resembles an all-you-can-eat buffet. It sounds generous, but it forces everyone to pay a price set by the heaviest users and reduces the firm’s incentive to invest in capacity for marginal, high-complexity queries. As Manne and Adams noted, even Obama-era Federal Communications Commission (FCC) leadership recognized that banning tiered pricing “would force lighter end users of the network to subsidize heavier end users.” Swap in “simple queries” for “light users” and “complex engineering tasks” for “heavy users,” and you have the AI compute-allocation debate in miniature.

None of this forecloses a claim. The AMD engineer’s data suggests that, for her use case—complex engineering work—the degradation was both severe and measurable. But any viable legal theory must grapple with what Anthropic actually represented, to whom, and whether the alleged degradation was material in the context of how those users actually used the product.

Let the Market Sort It Out

The third position, advanced by Dirk Auer, pushes back further. On this view, there is no real problem here, or at least not a legal one.

Firms providing AI services face genuine resource-allocation constraints. GPU compute is expensive and finite. Managing how those resources get distributed across queries is not optional; it is essential. A company that allocates maximum compute to every query, regardless of complexity, would either go bankrupt or charge prices that exclude most users. As Manne and Adams put it in the broadband context, usage-based allocation “is, and has always been, a basic business decision—as it is for every other company that uses it (which is to say: virtually all companies).”

The market-rationalist argument follows naturally. Consumers are not well-positioned to judge how much “thinking” a given query requires. Most users have no idea how much compute their question should consume, and they should not need to. What they care about is output quality. If a firm can deliver satisfactory results while using fewer resources on simpler queries, that is a Pareto improvement: the firm cuts costs it can reinvest, and consumers still get what they need.

On this view, competition—not regulation—provides the relevant discipline. If Claude’s quality degrades enough that users notice and care, they will switch to alternatives: GPT, Gemini, or whatever comes next. The AMD engineer’s viral GitHub issue is itself evidence that this feedback loop works. A sophisticated user identified a problem, publicized it, and put the company under pressure to respond. That is the market doing its job.

This cautionary stance draws support from the uneven track record of consumer-protection regulation in digital markets. The European Union offers a prominent example. Its Digital Markets Act (DMA) promised more competition and better services. In practice, it has often delivered the opposite: degraded user experience, higher costs, and reduced functionality, all in the name of consumer protection. Platforms have stripped features, imposed consent walls, and raised prices.

Importing a similar quality-maintenance mandate into the AI context risks the same result. Regulators would dictate resource-allocation decisions that firms are better positioned to make, while consumers bear the costs of the resulting inefficiencies.

From Fixed Promises to Flexible Performance

What stands out in this disagreement is how quickly three people working within the same law & economics tradition reached sharply different conclusions. That divergence reflects genuinely novel facts. We lack well-developed legal frameworks for deciding when a dynamically allocated computational service has been “degraded” as opposed to “optimized.” This is not a static product with fixed specifications. It is a real-time service, where quality emerges from resource-allocation decisions that can vary by time of day, server load, and query complexity.

The answer likely turns on facts we do not yet have. If Anthropic’s internal documents show that it knowingly reduced quality below represented levels to cut costs, the hawk position gains force. If the changes reflect genuine optimization—preserving output quality for most use cases while reducing waste—the market-rationalist view looks stronger. If the truth falls in between—quality held steady for most users but slipped for power users paying the most and relying on the product the hardest—then we land in the messy middle where consumer-protection law usually operates.

One point seems clear. As AI services embed more deeply in professional workflows, and as the gap between what a model could do and the resources it is given becomes a tunable parameter, these disputes will recur. The FTC, the European Commission, and their counterparts will need to grapple with what “quality” means when capability is a dial, not a fixed attribute—and when the provider’s hand never leaves the knob.

You may also like

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More