StutterZero: Speech Conversion for Stuttering Transcription and Correction
15 days ago
- #end-to-end-models
- #speech-processing
- #stuttering-correction
- Introduction of StutterZero and StutterFormer, the first end-to-end waveform-to-waveform models for stuttering transcription and correction.
- StutterZero uses a convolutional-bidirectional LSTM encoder-decoder with attention.
- StutterFormer integrates a dual-stream Transformer with shared acoustic-linguistic representations.
- Both models trained on synthesized stuttered-fluent data from SEP-28K and LibriStutter corpora.
- Evaluated on unseen speakers from the FluencyBank dataset.
- StutterZero showed a 24% decrease in Word Error Rate (WER) and 31% improvement in semantic similarity (BERTScore) compared to Whisper-Medium.
- StutterFormer achieved better results with a 28% decrease in WER and 34% improvement in BERTScore.
- Results validate the feasibility of direct end-to-end stutter-to-fluent speech conversion.
- Potential applications include inclusive human-computer interaction, speech therapy, and accessibility-oriented AI systems.