Not all supposed barriers to entry are created equal. The ones that matter for antitrust are not just costs, advantages, or inputs controlled by leading firms. They are durable impediments that keep rivals from entering, expanding, and disciplining market power. That distinction matters in generative artificial intelligence (AI), where policymakers increasingly worry that control over data will entrench a small group of large technology firms.
My recent Albany Law Journal of Science and Technology article, “Is Data Really a Barrier to Entry? Rethinking Competition Regulation in Generative AI,” co-authored with Satya Marrar, challenges that assumption head-on. We argue that fears of data scarcity and monopolization are overstated—and that premature regulation may do more to stifle AI innovation than to protect competition. (See here for a March 2025 Mercatus Center Working Paper version of this article.)
Yes, data matters. But data is not destiny. Standing alone, it is not a sound basis for sweeping ex ante regulation or speculative antitrust attacks on generative AI markets.
Data Moats Aren’t Castles
The “anticompetitive data moat” theory holds that a firm can secure durable market power by controlling a large, unique, hard-to-replicate dataset that rivals need to build or improve competing products. (See, for example, here.)
The story has intuitive appeal. Large digital platforms have amassed vast quantities of user data. Generative AI systems train on data. So the firms with the most data will dominate AI. On this view, antitrust enforcers and sectoral regulators should move early to stop large firms from locking up the market.
That syllogism is too crude. It treats “data” as a homogeneous commodity. It is not. Data is a heterogeneous input whose value turns on context, quality, legality, format, freshness, and use case. A large trove of low-quality or irrelevant data may be far less valuable than a smaller, well-curated, domain-specific dataset. Data that matters for consumer search may have little bearing on legal research, medical coding, cybersecurity, logistics, chip design, industrial maintenance, or accounting. An advantage in one AI application does not necessarily translate into dominance in another.
This distinction matters because antitrust law does not punish firms for possessing useful assets. It targets exclusionary conduct that harms the competitive process. An incumbent’s control of an input that rivals would like to have does not establish monopoly power. It does not show foreclosure. In generative AI, the data-input story weakens further because developers have multiple paths to competitive products: open datasets, licensed corpora, synthetic data, enterprise-specific data, retrieval-augmented generation, fine-tuning, distillation, open-weight models, and vertical specialization.
A static view of data also misses the dynamic nature of AI competition. The relevant question is not whether some firms have more data today. It is whether rivals can obtain, generate, license, substitute, or work around the data they need to compete tomorrow. On that score, the market evidence points away from fatalism.
First, Do No Harm—to Competition
Antitrust should promote competition, not precautionary industrial planning. In an October 2025 Truth on the Market post on AI and antitrust, I criticized the “precautionary instinct” as inconsistent with the “American economics-oriented, case-and-fact-specific antitrust-enforcement philosophy,” which demands evidence that a challenged practice is actually harming or likely to harm competition.
That framework fits generative AI. Enforcers should not ignore exclusionary conduct. They should scrutinize acquisitions that eliminate significant nascent rivals, exclusive contracts that substantially foreclose critical inputs, collusive arrangements, deceptive practices, and technical restrictions that block interoperability without legitimate justification.
But scale alone is not a violation. Data possession is not presumptively unlawful. And speculative future bottlenecks do not justify present regulatory mandates.
The costs of getting this wrong are high. Heavy regulation in an emerging technology market can entrench the very firms it targets. Large incumbents can absorb compliance costs, legal uncertainty, reporting obligations, audits, and licensing burdens far more easily than startups. A rulebook meant to check “Big Tech” dominance can become a moat around Big Tech.
That result is not a paradox. It is a recurring feature of regulation. Fixed compliance costs fall more heavily on smaller firms. Complex access mandates demand armies of lawyers, engineers, and compliance officers. Rules that restrict data use can make it harder for entrants to train, test, and adapt models. Mandatory sharing obligations can weaken incentives to collect, license, clean, and curate data in the first place. The likely outcome: less entry, less experimentation, and less competitive pressure.
The Moat Is Shrinking Faster Than You Think
The strongest case against data fatalism is the speed at which AI capabilities are improving—and costs are falling. Stanford University’s 2025 AI Index reports that the cost of running a system at roughly GPT-3.5 level dropped more than 280-fold between November 2022 and October 2024. It also finds that the performance gap between leading closed-weight and open-weight models on some benchmarks shrank from 8.04% in January 2024 to 1.70% by February 2025.
Those numbers do not fit a story of durable, data-driven foreclosure. If incumbents’ data advantages were decisive, the gap between closed models and open challengers would widen—or at least hold steady. It has not. Open-weight models have improved rapidly. Smaller models have become more capable. Inference costs have collapsed. Developers can increasingly build useful AI applications without owning massive proprietary data troves.
None of this makes frontier-model development cheap. It is not. Training state-of-the-art models still demands significant investment in compute, talent, infrastructure, and evaluation. But antitrust law distinguishes between high cost and unlawful entry barrier. Many industries require large, upfront investment. That alone does not turn successful firms into regulated utilities.
More important, frontier training is only one slice of the market. The AI stack spans chips, cloud services, data centers, foundation models, open-weight models, inference providers, model-routing tools, enterprise applications, consumer chatbots, developer platforms, fine-tuning services, synthetic-data providers, evaluation tools, and vertical AI products.
Competition can thrive at one layer even if another is concentrated. A firm that leads in general-purpose chat may lose in legal research, pharmaceutical discovery, financial compliance, or customer-service automation. A model that lags on general benchmarks may still win on privacy, latency, cost, customization, or local deployment.
That layered reality makes broad regulation especially risky. The law should not assume that a single market structure governs all AI applications.
Valuable Doesn’t Mean Essential
A core mistake in the data-moat theory is to confuse value with indispensability. Data can be valuable without being essential. It can confer an advantage without foreclosing rivals. It can be hard to obtain in one form, yet available through substitutes in another.
Consider a legal AI product. Success may turn less on a vast general corpus than on access to statutes, cases, regulations, treatises, firm documents, citation tools, expert feedback, and workflow integration. Consider a medical AI product. The differentiators may be clinical validation, privacy-preserving deployment, specialized labeling, and integration with hospital systems. Consider a manufacturing AI product. The key inputs may come from a customer’s own sensors, machines, manuals, and maintenance records.
In each case, the relevant asset is not “data” in the abstract. It is the ability to combine specific data with effective engineering and customer-specific knowledge.
Generative AI also lets firms economize on data. Retrieval-augmented generation can connect models to external databases at inference time, rather than embedding all knowledge in pretraining. Fine-tuning can tailor a model to a narrower domain. Distillation can transfer capabilities from larger models to smaller ones. Synthetic data can supplement scarce real-world examples. Human feedback and evaluation can improve outputs without requiring ownership of all underlying source material.
These techniques are not magic. They do not eliminate the need for data. But they weaken the claim that only firms with the largest preexisting datasets can compete. They make data advantages contestable.
If Data Is ‘Essential,’ Everything Is
Some proposals implicitly treat AI data as an essential facility. If leading firms control data that rivals need, the argument goes, those firms should have to share it, license it, or make it available on regulated terms. (See, for example, here, here, and here.)
That temptation should be resisted. Essential-facilities-style duties are hard to administer even in mature infrastructure markets. They fit poorly in a fast-moving innovation market. (See the U.S. Supreme Court’s 2004 decision in Verizon v. Trinko and Phillip Areeda’s 1990 Antitrust Law Journal piece “Essential Facilities: An Epithet in Need of Limiting Principles.”) Courts and regulators are not well positioned to decide which datasets are essential, what access terms are reasonable, how to protect privacy and intellectual-property rights, how to maintain data quality, or how to preserve dynamic incentives.
Compelled access also risks undercutting the very investments that make data valuable (see here). Raw data rarely has value until firms collect it lawfully, clean it, structure it, label it, filter it, update it, and integrate it into a model-development process. If firms must later share the results on regulated terms, their incentive to make those investments may diminish.
Traditional antitrust offers a better path. If a firm uses exclusive contracts to foreclose rivals from truly indispensable inputs, enforcers can investigate. If an acquisition eliminates a meaningful future competitor without offsetting efficiency justifications, enforcers can challenge it. If firms collude to divide AI markets or suppress competition, antitrust law can respond. But remedies should follow evidence. They should not start from the premise that data ownership is itself a competitive wrong.
Don’t Mistake Collaboration for Collusion
My April 2026 post on competitor collaboration in the AI era makes a related point: competition policy should “complement—not undermine—efforts to promote innovation and economic growth,” and antitrust should not stand in the way of lawful, output-expanding AI infrastructure development.
That logic extends beyond infrastructure. AI development often depends on collaboration among firms with complementary capabilities: cloud providers, chipmakers, model developers, data licensors, universities, startups, enterprise customers, and safety researchers. Some collaborations raise legitimate antitrust concerns. Many do not. They can be procompetitive—allowing firms to share risk, pool complementary assets, accelerate deployment, and challenge better-capitalized incumbents.
A regulatory posture that treats AI collaboration with suspicion risks undermining entry. Smaller firms often need partnerships to secure compute, distribution, or specialized data. Universities and nonprofits may rely on private-sector support. Enterprise customers may need model providers to coordinate with software vendors and cloud platforms. Reflexive hostility to collaboration will not promote competition. It will fragment the ecosystem and raise costs.
The same holds for data licensing. Exclusive or semi-exclusive agreements may, in some cases, be anticompetitive. But they can also be efficient. They can incentivize data owners to make lawful datasets available. They can protect privacy, security, and quality. They can fund curation and labeling. And they can give entrants access to valuable inputs without requiring them to build data-collection operations from scratch.
The right question is not whether a data arrangement involves a large firm. It is whether it substantially forecloses competition and lacks offsetting procompetitive benefits.
The 1990s Called—They Warned Against This
There is a useful historical analogy. In a February 2025 post, I argued that “promoting competition, not regulation,” is key to U.S. AI leadership and pointed to the Clinton administration’s relatively deregulatory approach to the early internet as a model of permissionless innovation.
The analogy is not perfect. AI raises different risks than the early internet, including concerns about safety, security, copyright, misinformation, privacy, and labor-market disruption. Some risks may warrant targeted legal responses. But the competition-policy lesson holds: premature regulation can lock in mistaken assumptions about technology and market structure.
In the 1990s, regulators could not have predicted the evolution of search, social media, e-commerce, smartphones, cloud computing, streaming, or app-based services. Had policymakers imposed rigid rules based on early internet business models, they might have stifled later innovation. Generative AI sits at a similarly early stage. Regulators should be humble about their ability to predict which models, firms, architectures, and business strategies will prevail.
The goal is to preserve the conditions for rivalry: entry, experimentation, investment, contracting, interoperability where justified, and enforcement against actual exclusion. It is not to redesign AI markets before they fully form.
Not All AI Roads Lead to Bigger Models
Another weakness in data-centered AI regulation is that it bakes in assumptions about how the technology will evolve. If regulators treat access to training data as the core competition problem, they risk privileging large-scale pretraining as the dominant model. But AI competition may not hinge on ever-larger general-purpose systems. It may turn on smaller models, specialized models, agentic workflows, private deployment, model routing, tool use, retrieval systems, and industry-specific applications.
A startup building a highly reliable tax-compliance assistant does not need to outgun the largest general chatbot. A medical-device company using a narrow AI model probably doesn’t need internet-scale training data. A law firm may prefer a private system grounded in its own documents. A manufacturer may prioritize a model that runs locally and protects trade secrets. In each case, the competitive question is not who has the most data. It is who best solves the customer’s problem.
Competition policy should not assume that the largest model is the relevant unit of analysis. Nor should it treat access to general training data as the decisive input. The market should be allowed to discover which approaches work.
Antitrust, Not AI Central Planning
A sound AI competition agenda would look different.
First, it would distinguish scale from exclusion. Large investments in compute, talent, and data may reflect vigorous competition. Size alone is not a sin.
Second, it would focus on conduct. Exclusive dealing, tying, predatory strategies, discriminatory access, collusion, and anticompetitive mergers should be assessed under established legal standards.
Third, it would account for efficiencies. AI partnerships, data licenses, cloud agreements, and model-distribution arrangements can expand output, reduce costs, improve safety, and accelerate deployment.
Fourth, it would resist turning antitrust into general AI regulation. Privacy, copyright, national security, and consumer-protection concerns belong to the legal regimes designed to address them—not to competition law stretched beyond its institutional competence.
Fifth, it would preserve room for experimentation. In fast-moving markets, false positives carry real costs. Blocking benign or procompetitive conduct can deprive consumers of innovations that never come to market.
In short, when assessing the role of data in AI, enforcers should prioritize evidence before intervention, actual competitive effects before remedies, and consumer welfare before regulatory ambition.
Data Isn’t Oil—and It Isn’t Destiny
The debate over generative AI needs less metaphor and more economics. Data is not oil. It is not a single resource, not necessarily scarce, not necessarily exclusive, and not necessarily decisive. Firms can copy, license, generate, clean, degrade, specialize, substitute, and combine it with other inputs in ways that reshape its competitive value.
The fact that data is useful in AI does not make it a durable barrier to entry. The fact that large firms hold data does not mean rivals cannot compete. And the importance of AI does not justify abandoning case-specific analysis for precautionary control.
The right policy is not passivity. It is disciplined enforcement. Antitrust agencies should target real exclusion, not hypothesized dominance. They should protect the competitive process, not particular competitors. They should recognize that innovation often comes from unexpected entrants, technical workarounds, open models, specialized applications, and new contractual arrangements.
Generative AI remains young, fluid, and highly contestable. Premature regulation built on an exaggerated data-bottleneck theory could slow the very forces lowering barriers and expanding access. The better course is to let competition work—and step in only when the evidence shows it is being unlawfully suppressed.
