22 Computational Social Scientists Need to Care About the Competitiveness of the AI Market
Patrick Wu, American University
Abstract:
AI usage statement: Claude Opus 4.7 was used to refine writing quality and the flow of individual sentences, and to format BibTeX citations.
22.1 Introduction
In this short position paper, I argue that computational social scientists need to start caring about the competitiveness of the AI market. Specifically, I contend that market concentration in AI presents a methodological issue for computational social science.
The AI market has expanded rapidly (Maslej et al. 2025). For example, AI capital expenditures added more to GDP growth than consumer spending in 2025 (Lichtenberg 2025). But tech firms are not entering this new market as equals. Incumbent tech giants, such as Google, Apple, Meta, Amazon, Microsoft, and Nvidia, use their existing resources to continue dominating the AI market. Even relatively newer firms such as OpenAI and Anthropic are both backed by tech giants (Microsoft and Amazon, respectively). But emerging competitors are putting pressure on these incumbent tech firms. For example, the release of DeepSeek-R1 shook the market. DeepSeek provided performance on par with then-contemporaneous reasoning models; at the same time, its developers claimed to use a fraction of the training resources (Guo et al. 2025). Although the training resources claim is contested (see, e.g., Patel et al. 2025), the market still reacted very sharply to this news. In a single day, Nvidia lost more than $600 billion in market cap.
Naturally, the incumbent tech giants have responded to this threat. They have both lobbied lawmakers and regulators to further entrench their dominant positions in the market (Henshall 2024; Oprysko 2025), and suppressed emergents through either outright acquisitions or quasi-mergers through licensing deals and “acquihires”, or arrangements where an emergent’s top talent are all hired away (Corrigan, Luong, and Schoeberl 2024; Kazimirov 2025). For example, Google signed an agreement with Character.ai that granted Google a non-exclusive license on Character.ai’s LLMs and saw its co-founders hired away to Google. Soon after this deal, Character.ai stopped developing new LLMs (Criddle 2024). In another example, Meta acquired 49% of Scale AI, a full-stack data platform, and hired away much of its top talent, including its co-founder, Alexandr Wang. Worried that Scale would expose their research and technical priorities to Meta, Scale’s largest customers—Google, OpenAI, Microsoft, and xAI—cut ties with Scale almost immediately (Tong, Cai, and Hu 2025). In both cases, these disruptors were neutralized despite not being outright acquired.
Some scholars have argued that, given the high fixed costs associated with training AI models, machine learning and AI constitute a natural monopoly (see, e.g., Narechania 2022). I have argued elsewhere that this is not necessarily the case—assumptions about data and computation resources have frequently been challenged in the past few years (Wu 2026), such as the development of reinforcement learning with verifiable rewards (Lambert et al. 2025). That said, I do not seek to adjudicate this issue in this position paper. Rather, I argue that computational social scientists must care about the competitiveness of the AI market, especially as AI tools and LLMs become more essential to social science methodology (Ziems et al. 2024).
I make this argument along two lines. First, there is growing evidence that models often converge in their responses, despite being trained with different approaches, data, and applications. In other words, there may be less diversity in model outputs. Researchers who use multiple LLMs as an approximation of independent measurements of a latent construct may find that the models are not independent at all. Second, a less competitive market means fewer models to choose from. The closed-source models that perform best are routinely updated and deprecated, forcing researchers to trade replicability for performance. And as the AI market consolidates on a small set of frontier models, errors from those models can be correlated across studies, producing a body of literature whose findings appear to converge but actually share a common source of bias.
22.2 Converging Responses and Representations
Computational social scientists increasingly use AI tools such as LLMs for tasks such as classification, free-form coding tasks, content analysis, and measuring latent positions and attributes (e.g., Ziems et al. 2024; Wu et al. 2023; Le Mens and Gallego 2025; Licht et al. 2025). In these contexts, computational social scientists frequently use multiple LLMs. Content analysis typically requires multiple coders because the analysis aims to measure a latent construct. Independent coders supply approximately independent noisy measurements of that construct. But if their labeling approaches are correlated due to shared biases or non-independent labeling, then inference fails (Krippendorff 2019). What the coders jointly measure, in that case, is the shared source of their correlation rather than the latent construct itself.
This same logic is applied to LLMs. Researchers frequently use multiple models to label or measure the latent constructs of interest. Typically, we assume that models trained by different firms, on different data, and using different methods, behave as approximately independent noisy measurements of the construct. But recent research has shown this is increasingly not the case, and a concentrating AI market may make the situation worse.
The algorithmic collusion literature suggests how this assumption of independence between LLMs may be violated. Calvano et al. (2020) show that independent Q-learning agents, operating in a shared market environment and unable to communicate, learn to charge supracompetitive prices that resemble tacit collusion. Assad et al. (2024) find that the spread of algorithmic pricing software in Germany’s retail gas market widened margins in competitive local markets once all stations adopted the software, a pattern consistent with tacit collusion. And Fish, Gonczarowski, and Shorrer (2026) find that, in oligopoly settings, LLM-based pricing agents quickly reach supracompetitive prices. In short, outcomes that appear to be collusive can emerge from independent learners optimizing in a shared outcome space. Although different firms train models, they use overlapping corpora, benchmarks, and, often, post-training approaches.
In non-pricing contexts, empirical works show that LLMs often generate similar outputs. Rozado (2024) finds that most major LLMs generate left-of-center responses to political orientation tests. Although these political orientation tests are not designed for LLMs and many contest their substantive interpretation, the consistent leftward signal across 24 models is the relevant finding here. Brown et al. (2025) examine whether LLMs systematically align with particular demographic groups on subjective annotation tasks. They find that demographic bias is not LLM-specific but dataset-specific: across four datasets, they find that the demographic group LLMs agree with varies by dataset. For example, LLMs sometimes agree more frequently with White annotators on certain datasets and with non-White annotators on others. But they find that within a given dataset, multiple LLMs exhibit the same direction of bias. Wenger and Kenett (2026) find that LLMs tend to be homogeneously creative: LLM responses tend to mirror other LLM responses far more than humans mirroring humans. Taken together, these works exemplify how LLMs, despite independent training, can produce similar outputs even on subjective tasks such as subjective annotation and creativity. And beyond empirical settings, researchers have observed this phenomenon at the model level. Huh et al. (2024) propose the Platonic Representation Hypothesis: as neural networks scale and are trained on increasingly similar data, their internal representations converge towards a shared model of the underlying concepts and knowledge that the data describes.
A concentrating AI market means a smaller number of firms that train on the same data, license each other’s models, hire away top talent from potentially disruptive firms, and benchmark against each other’s outputs. Each acquihire or licensing deal narrows the space of independent training pipelines further. This has significant methodological consequences for computational social science: multi-coder inference can fail when using LLMs, and the more concentrated the market becomes, the worse this problem becomes.
22.3 Fewer Models Means Fewer Options
Beyond pushing LLMs towards correlated outputs, a less competitive AI market also leaves researchers with fewer models to choose from. This can lead to two problems. First, computational social scientists will increasingly face a difficult choice between powerful but closed-source models with replicability problems and less powerful but open-source models with improved replicability (Spirling 2023; Barrie, Palmer, and Spirling 2025; Alizadeh et al. 2025). Market concentration tightens this bind along several dimensions. If improvements to closed-source models outpace open-source models, this tradeoff becomes more stark. When only a handful of firms produce frontier models, researchers have little leverage to demand archived snapshots or longer deprecation windows. Closed models are updated silently, retrained on new data, fine-tuned further using new proprietary techniques, and quietly retired (Ollion et al. 2024). With fewer competitors, vendors can quickly raise API costs, making replication of earlier work unaffordable for researchers who cannot absorb the new costs (Carammia, Iacus, and Porro 2024). Computational social scientists who choose closed models are effectively trading the replicability of their findings for performance. That trade-off gets steeper as the market concentrates further.
Second, computational social science may face an algorithmic monoculture problem, with researchers all reaching for the same small set of frontier models. Kleinberg and Raghavan (2021) define algorithmic monoculture as the “notion that choices and preferences will become homogeneous in the face of algorithmic curation”. They find that a group of decision-making agents using the same algorithm can reduce the collective quality of decisions even when the shared algorithm is more accurate for any individual decision-maker in isolation. They attribute this to correlated failures across agents using the shared algorithm.
The mapping to computational social science is direct: if computational social science increasingly relies on a few models, errors from using them can be correlated across multiple studies, producing a body of literature whose findings may appear to converge but may share a common source of bias. In other words, studies that appear to be independent investigations of a phenomenon of interest may simply reflect a shared blind spot among a handful of frontier models. This is methodologically similar to the problem in Section 22.2, but the mechanism is different: there, nominally independent LLMs within a single study behave as correlated coders. Here, even a study that attempts to diversify its models still draws from the same shrinking pool of options as every other study.
22.4 What Can We Do?
At present, the AI market’s concentration depends on the layer examined in its technical stack. Compute, for example, is highly concentrated: in 2025, Nvidia controlled 95% of the GPU market (L. 2026). On the other hand, the model layer appears to be more competitive. The Arena AI benchmark, for example, shows that closed and open-source models are all quite competitive with each other at the moment (Chiang et al. 2024). But this level of competition is not guaranteed: incumbent tech giants continue to rapidly and aggressively acquire startups through formal acquisitions and less formal quasi-acquisitions (Wu 2026).
Three approaches stand out. The first two are methodological, and the third calls on the discipline to take on a new policy role. First, researchers should use architecturally diverse models. The diversity of model types used matters more than the diversity of vendors. While the costs and the expertise required to use encoder-only or encoder-decoder models are higher than calling a frontier model through an API, these architectures have fundamentally different inductive biases. As a result, architecturally distinct models can break correlations more reliably than another transformer-based generative LLM. Using these models alongside generative LLMs may incentivize developers to continue developing them, even as commercial attention focuses on generative LLMs.
Second, researchers must continue to validate against held-out human-annotated data. As the quality of LLMs improves, the temptation to skip human annotation increases. But human-annotated data matters more than ever. Human-annotated data is expensive to collect, and much of it might increasingly be labeled by LLMs themselves (Westwood 2025). That said, genuinely human-annotated data still remains one of the few ways to identify the collective blind spots of frontier models. Researchers using LLMs for measurement should plan from the start to collect a held-out human-annotated subsample under conditions that preclude annotator use of LLMs, and report calibration of model labels against that subsample as part of the research design and results.
The third approach involves directly engaging with antitrust and competition policy research. Computational social science has a direct stake in the competitiveness of the AI market: the diversity of available models—with different architectural designs, different training regimes, and different post-training techniques—bears directly on the methodological soundness of the field, and a concentrated market erodes that diversity. If computational social science methods depend on the AI market, and the AI market is shaped by competition policy, then competition policy is no longer just something that exogenously affects the discipline. Rather, it becomes something that the discipline has both the standing and an obligation to study. While antitrust scholarship has generally been the purview of law and economics, the methodological consequences of market concentration fall squarely within the expertise of computational social scientists. In other words, the discipline should treat AI antitrust the same way that political scientists treat redistricting algorithms, or sociologists treat platform governance.