Hasty Briefsbeta

Bilingual

Web-scraping AI bots cause disruption for scientific databases and journals

a year ago
  • #AI
  • #Publishing
  • #Bots
  • DiscoverLife, an online image repository, experienced a surge in traffic due to bots, slowing down the site.
  • Bots are increasingly problematic for scholarly publishers and researchers, scraping content for generative AI training.
  • Many suspect bots are gathering data to train AI tools like chatbots and image generators.
  • The high volume of bot requests strains systems, causing financial and operational disruptions.
  • Smaller organizations with limited resources are particularly vulnerable to these disruptions.
  • Internet bots have been around for decades, with some being useful, like those used by search engines.
  • The rise of generative AI has led to an increase in 'bad' bots that scrape content without permission.
  • Publishers like BMJ and Highwire Press report significant increases in 'bad bot' traffic, causing service disruptions.
  • COAR reported that over 90% of surveyed members experienced AI bots scraping content, with two-thirds facing service disruptions.
  • The release of DeepSeek, a Chinese-built LLM, showed that powerful AI models could be made with fewer resources, leading to more bots scraping training data.