Hasty Briefsbeta

StutterZero: Speech Conversion for Stuttering Transcription and Correction

15 days ago
  • #end-to-end-models
  • #speech-processing
  • #stuttering-correction
  • Introduction of StutterZero and StutterFormer, the first end-to-end waveform-to-waveform models for stuttering transcription and correction.
  • StutterZero uses a convolutional-bidirectional LSTM encoder-decoder with attention.
  • StutterFormer integrates a dual-stream Transformer with shared acoustic-linguistic representations.
  • Both models trained on synthesized stuttered-fluent data from SEP-28K and LibriStutter corpora.
  • Evaluated on unseen speakers from the FluencyBank dataset.
  • StutterZero showed a 24% decrease in Word Error Rate (WER) and 31% improvement in semantic similarity (BERTScore) compared to Whisper-Medium.
  • StutterFormer achieved better results with a 28% decrease in WER and 34% improvement in BERTScore.
  • Results validate the feasibility of direct end-to-end stutter-to-fluent speech conversion.
  • Potential applications include inclusive human-computer interaction, speech therapy, and accessibility-oriented AI systems.