The State of Modern AI Text to Speech Systems for Screen Reader Users

4 hours ago

Copy Link

Text-to-speech (TTS) technology for blind users hasn't changed in 30 years, unlike advancements for sighted users.
Blind users prefer fast, clear, and predictable robotic voices over natural-sounding ones, often listening at 800-900 words per minute.
Eloquence, the preferred TTS voice for many blind users, was last updated in 2003 and faces compatibility and security issues.
Modern TTS advancements often exclude blind users' needs, leading to inefficient voices for non-English languages.
Espeak-ng, an open-source TTS system, supports many languages but suffers from outdated design and limited maintenance.
Testing AI-based TTS systems (Supertonic and Kitten TTS) revealed issues like dependency bloat, inaccuracy, slow speed, and lack of customization.
AI TTS models prioritize natural sound over accuracy, skipping words and misreading numbers, making them unsuitable for screen readers.
Older TTS systems allow real-time parameter adjustments (pitch, speed, etc.), which AI models lack, reducing functionality for blind users.
The future of TTS for blind users is uncertain, with Eloquence becoming unsustainable and modern AI TTS not meeting their needs.
A potential solution would be an open-source reimplementation of Eloquence, but it requires significant funding and expertise.

Hasty Briefsbeta