Building a Korean ambiguity solver fast enough to skip the GPU: 7,300 words/SEC

3 days ago

A Korean ambiguity solver was developed for Kimchi Reader, a tool for Korean language learners, to resolve lemma ambiguity efficiently without a GPU.
The solution is a 14M-parameter KoELECTRA model quantized to int8, running server-side on a CPU at about 7,300 disambiguations per second.
Four attempts were made over time, starting with fine-tuning Gemma 3 1B (slow and inaccurate), then trying embeddings, training a custom model, and finally succeeding with KoELECTRA.
Key constraints were speed (needed for processing entire books ahead of time) and using the model only to suggest from rule-based lemmatizer candidates, ensuring no hallucination.
The final approach involved a closed-set selection from pre-generated candidates, optimized through quantization, custom Rust inference, and SIMD for CPU performance.
Throughput improved significantly, with production handling ~18,500 words/second on 16 cores, and accuracy enhanced stats like word frequency rankings.

Hasty Briefsbeta