Hasty Briefsbeta

Bilingual

Replacing a 3 GB SQLite db with a 10 MB FST (finite state transducer) binary

5 hours ago
  • #Finnish dictionary optimization
  • #Rust programming
  • #Finite State Transducer (FST)
  • Author improved an English-Finnish dictionary app by switching from a trie to an FST, achieving significant memory reduction.
  • The initial trie approach worked for around 400,000 items using ~60 MB of RAM but couldn't scale to 40-60 million due to Finnish agglutination.
  • Using SQLite with FTS provided a temporary solution at the cost of a 3 GB download, leading to inefficiency.
  • Inspired by BurntSushi's FST method, the author adopted a finite state machine approach in Rust, compressing data into ~10 MB—a 300x reduction.
  • The FST's efficiency comes from suffix sharing, crucial for handling Finnish's inflections, and is well-suited to static runtime data.
  • The 'pocket dictionary' design goal emphasizes portability, with the new version achieving ~20 MB and improved performance.