SaySynth: A Brief History of Speaking Machines
2 days ago
- #Creative Coding
- #Speech Synthesis
- #History of Technology
- SaySynth is a synthesizer built on macOS's text-to-speech framework, using the 'say' command with a hidden low-level DSL for controlling phoneme-level prosody.
- Four types of speaking machines: Mechanical (like von Kempelen's 1773 device), Formant/Rule-Based, Sample-Based (like MUSA), and Generative (modern AI).
- Historical examples include Faber's Euphonia (1845), Edison Talking Dolls (1890s), VODER (1939), MUSA (1978), and S.A.M. (1982), showing recurring patterns like feminization and operator invisibility.
- Singing is often used as a proof-of-concept for TTS, implying it's a pinnacle of human expression; cultural biases are encoded in speaking machines, like female-coded AI assistants.
- The 'say' command's DSL allows per-phoneme pitch control, enabling creative misuse for synthesizer-like effects, now deprecated in favor of less flexible SSML.
- SaySynth uses a YAML-based sequencer to spawn parallel 'say' processes, with drift creating organic sounds, and supports alternative tunings via Ableton's Scala format.
- The future of AI risks dehumanization by compressing expressive voice range; art can challenge this by embracing tools' limitations and failures for creative strangeness.