The engine behind the 100 TB GitHub search engine
12 hours ago
- #System Architecture
- #Code Search
- #GitHub
- GitHub built a new search engine from scratch in Rust, called Blackbird, to address code search needs.
- The engine uses ngram indices optimized for code, with unique features like handling punctuation and regex searches.
- The index is sharded by Git blob object IDs and uses delta encoding for efficient storage and querying.
- The system scales to handle over 45 million repositories, with high query performance and low latency.
- Code search performance: shards achieve p99 response times of about 100 ms, significantly faster than brute-force methods like grep.