Realtime regression in non-English production voice agents
2 days ago
- #non-English performance
- #model regression
- #AI voice platform
- A production AI voice platform using the OpenAI Realtime API is experiencing a regression when moving from the validated model 'gpt-realtime-mini-2025-10-06' to the replacement 'gpt-realtime-mini', especially in non-English scenarios like Romanian.
- The new model shows worse language quality and faithfulness to business data, including hallucinations of non-existent departments and services, unlike the older snapshot which was rigorously tested for reliability.
- This issue impacts an enterprise rollout across dozens of locations, affecting live AI phone conversations, appointment summaries, CRM records, operational reporting, and client trust.
- Evidence includes transcription examples comparing the two models, raising concerns about broader non-English performance issues.
- Questions are raised about whether others have observed similar regressions, if OpenAI tracks language-specific issues, and if there's a path for extended access or migration when replacements aren't behaviorally equivalent.