Hasty Briefsbeta

  • #scalability
  • #clickhouse
  • #database-migration
  • Geocodio transitioned from MariaDB with TokuDB to ClickHouse, Kafka, and Vector for request logging due to scalability issues.
  • Initial setup used MariaDB with TokuDB for high-performance inserts and storage efficiency but faced deprecation and performance degradation.
  • ClickHouse was chosen for its column-oriented storage, ideal for analytics and aggregations on large datasets.
  • Direct row-level inserts into ClickHouse caused 'TOO_MANY_PARTS' errors, leading to the adoption of buffer tables, which introduced new issues.
  • Kafka was introduced as an event streaming platform to handle high throughput and fault tolerance for data ingestion.
  • Vector was added to batch process and insert data into ClickHouse efficiently, solving the small insert problem.
  • The solution involved running both MariaDB and ClickHouse in parallel for validation before full migration.
  • ClickHouse Cloud was chosen over self-hosting for easier updates, managed infrastructure, and scalability.
  • Feature flags were used to enable gradual deployment and validation of the new system without downtime.
  • Batch inserts of 30k-50k records were identified as critical for ClickHouse performance.