10% and Rising: Measuring ChatGPT’s Quiet Influence on Research

A new study published on arXiv has uncovered the dramatic and unprecedented impact of large language models (LLMs) like ChatGPT on scientific writing. The research, conducted by a team from the University of Tübingen and Northwestern University, analyzed over 14 million biomedical abstracts from PubMed to track changes in academic writing styles before and after the release of ChatGPT.

Key Findings

  • At least 10% of scientific abstracts published in early 2024 were likely processed using LLMs, with the true number potentially much higher.
  • The impact varied widely across fields and countries, reaching up to 30% in some areas like computational biology.
  • LLM usage was detected through an increase in certain style words and phrases favored by AI models.
  • The scale of this change surpassed even the dramatic shift in scientific vocabulary seen during the COVID-19 pandemic.

Methodology

The researchers developed a novel “excess word usage” approach, inspired by excess mortality studies during the pandemic. By comparing word frequencies before and after ChatGPT’s release, they identified words and phrases that showed abnormal increases in usage – a linguistic fingerprint of LLM involvement.

Implications

While LLMs can improve readability and help non-native English speakers, the study raises concerns about potential negative impacts:

  • Homogenization of scientific writing styles
  • Propagation of biases present in LLM training data
  • Risk of factual errors or hallucinated content slipping into papers
  • Potential misuse by paper mills to generate fake research

Lead author Dmitry Kobak commented: “Our work shows that the effect of LLM usage on scientific writing is truly unprecedented and outshines even the drastic changes induced by the Covid-19 pandemic.”

The Future of Academic Publishing:
This study provides hard data on a trend many have suspected – AI is rapidly transforming how scientific research is communicated. It highlights the urgent need for clear policies and guidelines around LLM use in academia. As these tools become more powerful and widespread, maintaining the integrity and diversity of scientific discourse will be a key challenge for the research community.

The authors suggest their methodology could be applied to track LLM usage in other domains like journalism, grant applications, and even creative writing. As AI continues to reshape how we create and consume written content, studies like this will be crucial for understanding its true impact on human communication and knowledge production.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.