The State of Modern AI Text to Speech Systems for Screen Reader Users
4 hours ago
- #Accessibility
- #Text-to-Speech
- #BlindUsers
- Text-to-speech (TTS) technology for blind users hasn't changed in 30 years, unlike advancements for sighted users.
- Blind users prefer fast, clear, and predictable robotic voices over natural-sounding ones, often listening at 800-900 words per minute.
- Eloquence, the preferred TTS voice for many blind users, was last updated in 2003 and faces compatibility and security issues.
- Modern TTS advancements often exclude blind users' needs, leading to inefficient voices for non-English languages.
- Espeak-ng, an open-source TTS system, supports many languages but suffers from outdated design and limited maintenance.
- Testing AI-based TTS systems (Supertonic and Kitten TTS) revealed issues like dependency bloat, inaccuracy, slow speed, and lack of customization.
- AI TTS models prioritize natural sound over accuracy, skipping words and misreading numbers, making them unsuitable for screen readers.
- Older TTS systems allow real-time parameter adjustments (pitch, speed, etc.), which AI models lack, reducing functionality for blind users.
- The future of TTS for blind users is uncertain, with Eloquence becoming unsustainable and modern AI TTS not meeting their needs.
- A potential solution would be an open-source reimplementation of Eloquence, but it requires significant funding and expertise.