Can text be made to sound more than just its words? (2022)

20 days ago

Copy Link

Captions typically represent words the same way regardless of vocal nuances like bawling, whispering, or yelping.
The paper proposes embedding visual representations of paralinguistic qualities (loudness, pitch, duration) into captions using typography (font-weight, baseline shift, letter-spacing).
An evaluation showed participants could match speech-modulated typography to original audio with 65% accuracy, with no significant difference between animated or static text.
Participants' mental models of speech-modulated typography varied widely, indicating diverse interpretations of the visual cues.

Hasty Briefsbeta