Hasty Briefsbeta

Bilingual

Pgit: I Imported the Linux Kernel into PostgreSQL

3 days ago
  • #linux kernel
  • #data analysis
  • #version control
  • Successfully imported the entire Linux kernel history (1,428,882 commits, 24.4 million file versions) into pgit, a Git-like CLI with PostgreSQL backend and delta compression.
  • Achieved significant compression: pgit's actual data size is 2.7 GB (vs. 1.95 GB with git gc --aggressive) and imported in 2 hours on a high-spec dedicated server.
  • Found interesting insights via SQL queries: only 7 f-bombs in commit messages (from 2 people), 665 bug fixes point to the first git commit, and a filesystem took 13 years to merge.
  • Analyzed development patterns: 90% of commits touch ≤5 files, 38,506 authors (36% contributed only once), and top 3 committers merge 22.5% of all commits.
  • Revealed organizational contributions: Intel leads by commit volume, Red Hat by productivity, and hobbyist (Gmail) contributions have declined from 12% (2010) to 8% (2025).
  • Identified strong file coupling (e.g., i915 driver files changed together 1,117 times) and profanity in source code (50+ instances of 'fuck' across history).
  • Demonstrated query performance: most analyses (e.g., churn, hotspots, authors) completed in under 10 seconds without preprocessing, leveraging PostgreSQL and pg-xpatch.