Hasty Briefsbeta

Bilingual

UK Biobank health data keeps ending up on GitHub

10 hours ago
  • #copyright takedowns
  • #data privacy
  • #health data exposure
  • UK Biobank uses copyright takedown notices to remove health data from GitHub, exploiting a mechanism typically for pirated software due to the lack of UK privacy laws like the DMCA.
  • Targeted files include Jupyter/R notebooks, genetic data files (PLINK, BOLT-LMM, BGEN), tabular datasets (CSV, TSV, Excel), and analysis scripts, often focusing on specific files rather than entire repositories.
  • Takedown notices began in July 2025, with 110 requests to GitHub, pausing in early 2026 and resuming after Guardian investigations exposed data exposure and takedown ineffectiveness.
  • Developers targeted are from at least 14 countries, primarily the United States (24) and China (21), with many lacking location details on GitHub profiles.
  • Methodology involves analyzing GitHub's DMCA repository, extracting filing dates and URLs, and using the GitHub API to gather user locations, though data is limited and imperfect.
  • The exposure highlights governance challenges for UK Biobank, with Guardian investigations revealing data matching risks, unauthorized access by insurance companies, and exclusive early data access to pharmaceutical firms.