Hasty Briefsbeta

Bilingual

The engine behind the 100 TB GitHub search engine

12 hours ago
  • #System Architecture
  • #Code Search
  • #GitHub
  • GitHub built a new search engine from scratch in Rust, called Blackbird, to address code search needs.
  • The engine uses ngram indices optimized for code, with unique features like handling punctuation and regex searches.
  • The index is sharded by Git blob object IDs and uses delta encoding for efficient storage and querying.
  • The system scales to handle over 45 million repositories, with high query performance and low latency.
  • Code search performance: shards achieve p99 response times of about 100 ms, significantly faster than brute-force methods like grep.