SaySynth: A Brief History of Speaking Machines

2 days ago

SaySynth is a synthesizer built on macOS's text-to-speech framework, using the 'say' command with a hidden low-level DSL for controlling phoneme-level prosody.
Four types of speaking machines: Mechanical (like von Kempelen's 1773 device), Formant/Rule-Based, Sample-Based (like MUSA), and Generative (modern AI).
Historical examples include Faber's Euphonia (1845), Edison Talking Dolls (1890s), VODER (1939), MUSA (1978), and S.A.M. (1982), showing recurring patterns like feminization and operator invisibility.
Singing is often used as a proof-of-concept for TTS, implying it's a pinnacle of human expression; cultural biases are encoded in speaking machines, like female-coded AI assistants.
The 'say' command's DSL allows per-phoneme pitch control, enabling creative misuse for synthesizer-like effects, now deprecated in favor of less flexible SSML.
SaySynth uses a YAML-based sequencer to spawn parallel 'say' processes, with drift creating organic sounds, and supports alternative tunings via Ableton's Scala format.
The future of AI risks dehumanization by compressing expressive voice range; art can challenge this by embracing tools' limitations and failures for creative strangeness.

Hasty Briefsbeta