How AI hears accents: An audible visualization of accent clusters
12 hours ago
- #speech-technology
- #language-learning
- #accent-clustering
- BoldVoice, an American accent training app, helps users from over 200 language backgrounds speak English clearly and confidently.
- A finetuned HuBERT model was used for accent identification, trained on 30 million speech recordings (25,000 hours) from BoldVoice's dataset.
- The model clusters accents in a 3D latent space using UMAP dimensionality reduction, preserving relative distances between clusters.
- Accent clustering is influenced more by geographic proximity, immigration, and colonialism than by language taxonomy.
- Examples include the Australian and Vietnamese clusters being close, and the French/Nigerian/Ghanaian grouping.
- The Indian subcontinent cluster shows regional grouping (e.g., southern vs. northwestern languages).
- The Mongolian and Korean clusters are near each other, possibly reflecting phonetic similarities.
- Voice standardization anonymizes speakers and highlights accent differences but introduces some artifacts.