Hasty Briefsbeta

Bilingual

Stable Audio 3

7 hours ago
  • #diffusion models
  • #machine learning
  • #audio generation
  • Stable Audio 3 is a family of fast latent diffusion models of varying sizes (small, medium, large) for variable-length audio generation and editing.
  • It supports variable-length generations to reduce computational costs for short sounds, inpainting for targeted editing, and continuation of recordings.
  • The models use a novel semantic-acoustic autoencoder to project audio into a compact latent space, balancing efficiency, fidelity, and semantic structure.
  • Adversarial post-training accelerates inference and improves quality by reducing steps while enhancing fidelity and prompt adherence.
  • Trained on licensed and Creative Commons data, it generates music and sounds in under 2 seconds on an H200 GPU or a few seconds on a MacBook Pro M4.
  • Weights for small and medium models, along with training and inference pipelines, are released for consumer-grade hardware.