Hasty Briefsbeta

How AI hears accents: An audible visualization of accent clusters

19 hours ago
  • #speech-technology
  • #language-learning
  • #accent-clustering
  • BoldVoice, an American accent training app, helps users from over 200 language backgrounds speak English clearly and confidently.
  • A finetuned HuBERT model was used for accent identification, trained on 30 million speech recordings (25,000 hours) from BoldVoice's dataset.
  • The model clusters accents in a 3D latent space using UMAP dimensionality reduction, preserving relative distances between clusters.
  • Accent clustering is influenced more by geographic proximity, immigration, and colonialism than by language taxonomy.
  • Examples include the Australian and Vietnamese clusters being close, and the French/Nigerian/Ghanaian grouping.
  • The Indian subcontinent cluster shows regional grouping (e.g., southern vs. northwestern languages).
  • The Mongolian and Korean clusters are near each other, possibly reflecting phonetic similarities.
  • Voice standardization anonymizes speakers and highlights accent differences but introduces some artifacts.