Hasty Briefsbeta

Bilingual

SaySynth: A Brief History of Speaking Machines

2 days ago
  • #Creative Coding
  • #Speech Synthesis
  • #History of Technology
  • SaySynth is a synthesizer built on macOS's text-to-speech framework, using the 'say' command with a hidden low-level DSL for controlling phoneme-level prosody.
  • Four types of speaking machines: Mechanical (like von Kempelen's 1773 device), Formant/Rule-Based, Sample-Based (like MUSA), and Generative (modern AI).
  • Historical examples include Faber's Euphonia (1845), Edison Talking Dolls (1890s), VODER (1939), MUSA (1978), and S.A.M. (1982), showing recurring patterns like feminization and operator invisibility.
  • Singing is often used as a proof-of-concept for TTS, implying it's a pinnacle of human expression; cultural biases are encoded in speaking machines, like female-coded AI assistants.
  • The 'say' command's DSL allows per-phoneme pitch control, enabling creative misuse for synthesizer-like effects, now deprecated in favor of less flexible SSML.
  • SaySynth uses a YAML-based sequencer to spawn parallel 'say' processes, with drift creating organic sounds, and supports alternative tunings via Ableton's Scala format.
  • The future of AI risks dehumanization by compressing expressive voice range; art can challenge this by embracing tools' limitations and failures for creative strangeness.