18 The Human Element in Social Science Research

Alexis Palmer, Tulane University, apalmer13@tulane.edu

Abstract:

AI usage statement: I prompted Claude to search for typos, grammar mistakes, and poor sentence structure. Suggestions were implemented at my discretion.

18.1 Introduction

Alongside the excellent and ever evolving work on how, whether, and when LLMs can be used in social science research, has been another disciplinary conversation: what will it look like to be a social scientist in a world where so many tasks can, at least on their face, be done with so little effort or personal ability? As publications touting the newest test of an LLM research task proliferate, so too do academic twitter conversations claiming the ability of generative AI to write a research paper in a matter of minutes,¹ take over the planning and execution of a research pipeline,² or perform any number of other tasks we view as central to the profession. Anthropic’s surveys have shown that as many as 80% of social scientists have at least tried using some AI tool, with 20% integrating coding agents (men being twice as likely to be in the latter group). Further while some of the uses of coding agents are tasks often handed off by the scholars who have sufficient resources – coding, editing – others were until recently unthinkable to put in others’ hands, such as generating ideas.³

As a consequence of these developments, conversations at most conferences or post-talk dinners have inevitably turned to what is our role in this evolving technological world? Will the skills of an individual researcher become obsolete? I address this with regard to two, long time interrelated issues in academia: what are the “right” standards to evaluate research and researchers? And who selects into doing that research? In sum, I argue that though our academic ideal is to produce knowledge, the current incentives are based around producing papers. Language Models can significantly reduce the costs of the latter, but the former still requires significant researcher contribution.

First, I will note, the below is heavily informed by conversations with innumerable people, many of whom may recognize their words. My thanks to everyone who has challenged me on this or any other topic – there is nothing as effective to force me to expand my perspective and reconsider my opinions.

18.2 Research Expectations

There has been a significant increase in not just the overall number of social scientists, but their relative productivity. Between 2003 and 2023, the number of political science authors increased threefold while the number of papers published increased more than fivefold (Torreblanca et al. 2025). Alongside this boom in productivity has come ever increasing expectations for scholars in terms of research output and publication quality when being hired, promoted, or allocated limited funding. There are therefore increasing incentives to turn out work that will be quick, cheap, and relatively unobjectionable to reviewers. This has led to a fundamental disconnect between the function of the academic institution and its ideals. Put simply, in the ideal, the academic job is to produce knowledge, but evaluations are based on the ability to produce papers.

As is true with many issues, the developments with language models make evident existing problems at a previously unfathomable scale, more so than creates new problems.⁴ In many ways, despite its ideals, academia is already structured against innovation. Incentives are centered around the amount of publications and work one is able to produce, and though theoretically top journals prioritize creativity and theoretical novelty, in practice, this standard can feel distant from the reality of the publishing process. This has been an existing and growing problem prior to the growth of AI. Early career scholars are steered away from more risky or time intensive projects as potentially harmful to their career expansion. Even post-tenure, the pressures of funding, internal raises, administrative responsibilities, and supporting graduate students can incentivize away from novelty. As funding available for research becomes even more contracted, the pressures to produce easily publishable research cheaply only compound.

18.2.1 The Role of Language Models

In response to both the growing capabilities of language models and this set of incentives, social scientists are increasingly offloading research tasks to the models. The most common are tasks that have long been handed off to software, or where feasible, other people such as writing code (97% of coding agent and 77% of general AI users have done this) or editing existing prose (87% and 72% respectively). Given minimal oversight, this type of output has both an expected form and is easily checkable. Others were more often the province of trusted advisors or colleagues – advice on methods (77% and 63%) and looking for literature (76% and 60%). Still others were once thought to be at the heart of a project, left only to those credited as authors on a paper, including drafting the manuscript (54% and 30%) and generating the ideas and theory underlying the paper (47% and 32%).⁵

Given the ability of models such as Claude to produce a reasonable looking research paper (including code) in a fraction of the time, what is the role of the researcher? Presumably, the future of research is not in churning out infinite AI produced papers. Which of these tasks still benefit from direct supervision or should even remain solely separate from generative models? And how can we reasonably shift incentives towards innovation over production? I will briefly discuss some of the most common tasks into which researchers have integrated LMs.

18.2.1.1 Generating Papers and Ideas

Models are, in fact, good at replicating the structure and syntax of a known text format; it is what they are designed to do. However, it is not news to say these models are not built for novelty. Language models trained on massive amounts of data will return, in simplistic terms, modal text. If our ultimate goal of research is to generate knowledge, involving an actor into the process who will revert to the mean seems counter productive. Further, as there are innumerable papers on model bias (Gallegos et al. 2024) and national division about diversity in academia, in some sense model authorship of papers (or review) is akin to prioritizing a majority voice. Though some researchers claim the ability of models to take on individual personas (Varnum et al. 2024), there is no convincing evidence they will actually produce a range of perspectives, especially on more complex questions.

Similarly, new ideas are often the product of making connections across topics or fields that had not previously existed, but are nonetheless logically consistent. This process of synthesizing known information to lead to a new conclusion is heavily informed by one’s existing knowledge and experience. While in some sense there are no new ideas, it is rare two researchers would arrive at the same idea, drawing on the same sources, towards the same operationalization. As models are both not ‘thinking’ in the way we understand it and produce from probability based on known text, their design is in many ways orthogonal to this process.

Finally, what is a “good” idea is often a pure matter of taste, even within academia. How we develop taste, what it is, how to measure it are all ephemeral concepts. Indeed, when any idea is put before a collection of people, there will inevitably be a wide range of opinions, each of which pushes back against or highlights a different facet of the problem. So while a model may provide a useful ‘first pass’ at brainstorming, it should not be the last stop.

A common pushback is models are still evolving. However, I both am not convinced models will ever fully replicate a human brain given how little we understand about cognition and creativity, and I do not believe our goal should be to produce an all knowing oracle. Instead, we should be prioritizing ways in which models can be used to shift incentives and time allocation towards researchers being more able to pursue novel theories.

18.2.1.2 Literature, Theory, and Methods

Instead of initial development, why not hand off intermediary tasks, such as writing a literature review? First, model hallucination is a well known problem, with regards to both representing the meaning of a paper and the existence of a paper altogether (Huang et al. 2025). This is not simply a problem for the literature review of a single paper. A colleague described a case in which a Claude generated paper which had misinterpreted a somewhat obscure theory, only recognizable to someone who had read deeply about the theory. If the paper had gone forward, those who read it and cited it would also misinterpret the concept. If researchers continued to use Claude, at some point the predominant understanding of this theory may be wrong.

Even at its best, with reasonable representations of existing work, models are more proficient at producing clean prose than thought provoking arguments. When summarizing literature to explicate a theory and ground mechanisms, clear writing matters, but so does logical coherence. This does not simply mean that statements are testable or interrelated, but that they all are grounded in a coherent worldview which informs the claims made in the paper, ideally that of the author. While models may be able to generate the text, it is more difficult to prompt them to accurately represent any individual’s specific perspective.

Beyond these usage problems, part of the value is the struggle. The value of process of finding, reading, and summarizing a body of work is more than its output to situate a reader in the literature. Instead, it forces the authors to contend with what they actually think, what they want to say, and how it differs from what we already know. This process is often time-consuming and relatively unrewarding. But it is also how we understand the role of our work and hone ideas. Additionally, our ability to judge and evaluate comes as much from consumption as creation of work. When I started grad school, faculty warned us that it would be easy to jump into critiquing others’ work, but once we started producing our own research we would understand how difficult it was to produce unassailable projects and how to find the value in imperfect conclusions. Research is in some sense meant to be messy, and it’s only through wrestling through our own process that we can confront that.

18.2.1.3 Data Collection and Creation

There has already been much written about language models as playing a role in the research design – as crowdworkers, silicon samples, or objects of study themselves.⁶ Setting aside some technical problems,⁷ much of it is excellent, but what we are ultimately learning is about the models, not about people. The limited work that has directly compared model versus human generated interventions has shown that participants in fact do not view the two as the same.(Palmer and Spirling 2023) Work on how people interact with models, what they’re learning from them, and the impact they have on the information environment is timely and necessary, but when the model itself is not part of the question, construct validity should be a central concern.

18.2.1.4 Paper Review

The review process, both in the sense of time commitment and seeming arbitrariness, is often one of the most frustrating parts of the research process. However, the idea of streamlining paper review with models runs into both practical and philosophical issues. On the practical level, it is not clear that models are actually better at this task, especially when it comes to newer or more innovative work, and when you consider the value of a diversity of opinions rather than what is essentially one (even if prompted to take on alternate personas). Again invoking the idea of taste, is there a single set of metrics or barometer for what is not just technically good but interesting and valuable work? If two scholars disagree about the contribution of a project, can we deem one of them wrong? Additionally, if we view peer review as not only a barrier one must pass for publication but also a mechanism through which work is improved, multiple perspectives or tastes is instrumental to moving a project further.

Second, fundamentally, when I write a paper, I care if it is persuasive, thought-provoking, anger-inciting to other people, not a model. If a group of scholars finds a paper extremely useful, but a model does not, which of those perspectives should predominate? With more simple tasks, such as answering straightforward questions (Bisbee et al. 2024), there is both significant variation in model output and little understanding of why a particular string is returned by the model. Why then, should we believe model evaluation of a complex item such as a research paper, especially given the difficulty of understanding how that evaluation is produced? Note, there is currently no evidence that asking a model why it returns a given answer has any logical connection to how that answer was produced.

Again, models are likely a very useful first pass at a paper. Researchers often circulate papers directly for feedback, but as with many aspects of the research process this can both require significant time and not be equally accessible across researchers and institutions. As such, using a model as a first reviewer can be very useful to identify common or systemic issues before submission. This growing ability to clean up papers may itself contribute to a more efficient review process.

18.2.1.5 As an Assistant

Despite these objections, there are many tasks Language Models have made much easier and cheaper for the average researcher to complete. Things like an initial search for literature, proofreading, or coding tasks have long been handed off to undergraduate research assistants when researchers have the funds. Models are reasonably good at tasks such as retrieving information, writing code, and looking for typos or grammar mistakes. Further, the product of these queries is both directly observable and verifiable. If prompting a model to write code, it is straightforward to check that this product is executing the correct task. Language models can reasonably fill this role, as long as they are subject to similar scrutiny as the products of a college student. There is abundant evidence that while models are generally good at these tasks, they are not infallible. Therefore, the output of models should still be checked. This goes towards the need to still teach researchers how to write code, search for literature, and write well: users need to be able to check the output of a model and identify problems, misrepresentations, and gaps.

The ability of language models to complete these tasks may serve as an equalizer on this dimension across high and low resourced institutions. Access to a pool of research assistants, the ability to pay said researcher assistants, and overall administrative support can vary widely by institution and significantly impact the ability to produce research. Therefore, models as a more accessible labor source for these types of tasks should not be discounted.

18.3 Who does research?

In one of the many conversations I’ve had about all of the above, a colleague asked me a question: do I view being a researcher as a job or calling? That is, do I think my role is to produce papers or to produce knowledge? This dichotomy encapsulates the tensions discussed; that while language models can simplify many parts of the research process, can enable producing more papers, faster, they can also flatten the questions we ask and the conclusions we draw. Fundamentally, I think there is more value in a project which requires months of reading, data collection, and fieldwork than in a Claude-generated paper, even if both are answering what is on its face the same question, and the latter is both faster and cheaper.

However, if this is indeed what we value, this highlights a second systematic problem: that a significant subset of those who are trained to be researchers have limited options outside academia. If the most comfortable, high-status option when equipped with a PhD is an academic research position, then we will continue to structure institutions to produce papers not knowledge. Again, this is not a new problem so much as one compounded by the growth of language models. As every quantitative researcher knows the difficulty of developing metrics to assess something multi-faceted and nuanced, we cannot rely solely on external evaluations of work as to its value. Instead, we need a system such that early-career academics are choosing to spend their time producing knowledge rather than it simply being their best job option.

This may, in fact, be helped by the growth of language models. There are two trends that have underlain much of this position piece: that models are increasingly good at doing rote tasks and that when it comes to deep analysis, understanding, and creativity, humans are hard to replace. There are clear incentives for graduate students to learn data skills as the application of data science to every field has only grown. However, there is also a need for highly trained people who can interpret that data in a social and cultural context. In that sense, there is a growing need for quantitative social scientists outside of academia.

This requires some shift in our training of graduate students. There is currently a wide variety of programs which take on different structures and goals for students. Developing analytical skills – how to think about a problem, how to apply old knowledge to new situations, what questions to ask – both helps develop new academic researchers and more competitive job seekers. This should come alongside deep case and sub-field learning, skills which allow students to engage with a wide variety of topics and problems as well as lead to a more open field post-graduate school.

18.4 Conclusion

In sum, the emergence of language models has deepened existing problems in social science; namely that when it comes to both entering the profession and producing research, our incentives are misaligned with our ideals. However, language models also offer new opportunities to outsource much of the ‘rote’ work which previously only those at top institutions could afford to hand off. These possibilities can constitute a significant shift in how researchers are able to allocate their time while working on minimal budgets.

When it comes to the broader expectations of using language models in researcher, there are several shifts in disciplinary expectations which I think could contribute. First, we’ve long established protocols for when contribution to a paper demands coauthorship – the point of this is not only fairness to scholars who contribute time and effort, but to denote the intellectual origins of a paper. Arguably, if using a language model for tasks that would lead to coauthorship if they were done by a human, then the model should be considered a coauthor on that paper and acknowledged as such. Second, placing weight on theoretical and methodological contributions over sheer volume of production, especially mixed methods work that brings together rigorous quantitative conclusions with deep contextual knowledge. Finally, students should still be taught the basic toolkit of research, as well as how, when, and to what standards to offload these tasks to models. Graduate education in particular should acknowledge and prepare students for roles outside academia.

https://x.com/ahall_research/status/2007603340939800664↩︎
https://x.com/cblatts/status/2027018464670491065↩︎
https://www.anthropic.com/research/coding-agents-social-sciences↩︎
The prevalence of misinformation being a timely example.↩︎
https://www.anthropic.com/research/coding-agents-social-sciences↩︎
The forthcoming Artificial Intelligence, Politics, and Political Science contains some nice summaries.↩︎
Such as the difficulty of replicating the products of language models (Barrie et al. 2025)↩︎