How AI hears accents: An audible visualization of accent clusters

12 hours ago

Copy Link

BoldVoice, an American accent training app, helps users from over 200 language backgrounds speak English clearly and confidently.
A finetuned HuBERT model was used for accent identification, trained on 30 million speech recordings (25,000 hours) from BoldVoice's dataset.
The model clusters accents in a 3D latent space using UMAP dimensionality reduction, preserving relative distances between clusters.
Accent clustering is influenced more by geographic proximity, immigration, and colonialism than by language taxonomy.
Examples include the Australian and Vietnamese clusters being close, and the French/Nigerian/Ghanaian grouping.
The Indian subcontinent cluster shows regional grouping (e.g., southern vs. northwestern languages).
The Mongolian and Korean clusters are near each other, possibly reflecting phonetic similarities.
Voice standardization anonymizes speakers and highlights accent differences but introduces some artifacts.

Hasty Briefsbeta