From Millions to Billions
5 days ago
- #scalability
- #clickhouse
- #database-migration
- Geocodio transitioned from MariaDB with TokuDB to ClickHouse, Kafka, and Vector for request logging due to scalability issues.
- Initial setup used MariaDB with TokuDB for high-performance inserts and storage efficiency but faced deprecation and performance degradation.
- ClickHouse was chosen for its column-oriented storage, ideal for analytics and aggregations on large datasets.
- Direct row-level inserts into ClickHouse caused 'TOO_MANY_PARTS' errors, leading to the adoption of buffer tables, which introduced new issues.
- Kafka was introduced as an event streaming platform to handle high throughput and fault tolerance for data ingestion.
- Vector was added to batch process and insert data into ClickHouse efficiently, solving the small insert problem.
- The solution involved running both MariaDB and ClickHouse in parallel for validation before full migration.
- ClickHouse Cloud was chosen over self-hosting for easier updates, managed infrastructure, and scalability.
- Feature flags were used to enable gradual deployment and validation of the new system without downtime.
- Batch inserts of 30k-50k records were identified as critical for ClickHouse performance.