Scientific datasets are riddled with copy-paste errors
10 hours ago
- #Open Access
- #Data Integrity
- #Scientific Research
- Software scanning open-access datasets detected 18 serious errors in 600 scientific papers.
- Case 1: Parkinson's study data contains duplicated sequences, affecting key motor function results.
- Case 2: Ostrich-snake mixup involves copy-paste errors and suspicious near-duplicates in toxin resistance data.
- Case 3: Fish size data scrambled due to file merging error, corrected by authors with minimal impact on conclusions.
- Error rate among datasets scanned is about 3%, but actual rate is likely higher due to undetectable issues.
- Institutions prioritize metrics over data verification; Dryad supports efforts to correct errors.
- Future plans include scanning 24,000 more datasets, expecting hundreds of additional error cases.