Hasty Briefsbeta

Bilingual

Gemma 4 for Telephony: From Two AI Models to One – Until I Switched to Chinese

4 hours ago
  • #telephony
  • #benchmark
  • #multimodal-LLM
  • Replaced two-model phone agent cascade with single multimodal Gemma 4, evaluating across English, French, and Mandarin.
  • English: Single model achieved 100% reply accuracy and faster latency (0.66s) vs cascade (93%, 0.81s).
  • French: Single model performed well (93% accuracy, 0.71s latency) but had a language slip in one answer.
  • Mandarin: Single model failed catastrophically (~0% accuracy) due to poor audio transcription, unlike cascade (92%).
  • Metric focused on reply correctness, not transcription WER, as it reflects caller experience.
  • Audio encoder quality varies by language; English/French work, Mandarin doesn't in this model.
  • Integration simplifies telephony stack by collapsing speech-to-text and reasoning into one call.
  • Recommendation: Use single model for English/French, keep cascade for languages like Mandarin.