Hasty Briefsbeta

Bilingual

Scientific datasets are riddled with copy-paste errors

10 hours ago
  • #Open Access
  • #Data Integrity
  • #Scientific Research
  • Software scanning open-access datasets detected 18 serious errors in 600 scientific papers.
  • Case 1: Parkinson's study data contains duplicated sequences, affecting key motor function results.
  • Case 2: Ostrich-snake mixup involves copy-paste errors and suspicious near-duplicates in toxin resistance data.
  • Case 3: Fish size data scrambled due to file merging error, corrected by authors with minimal impact on conclusions.
  • Error rate among datasets scanned is about 3%, but actual rate is likely higher due to undetectable issues.
  • Institutions prioritize metrics over data verification; Dryad supports efforts to correct errors.
  • Future plans include scanning 24,000 more datasets, expecting hundreds of additional error cases.