A Step Towards Music Generation Foundation Model

a year ago

ACE-Step is an open-source foundation model for music generation that integrates diffusion-based generation with DCAE and a lightweight linear transformer.
The model achieves state-of-the-art performance, synthesizing up to 4 minutes of music in 20 seconds on an A100 GPU, 15× faster than LLM-based baselines.
ACE-Step supports 19 languages, various music styles, and advanced control mechanisms like voice cloning, lyric editing, and track generation.
The model includes features like Variations Generation, Flow-Edit, Lyric2Vocal, and StemGen for localized modifications and creative enhancements.
Performance benchmarks show high throughput on various GPUs, with detailed installation and training instructions provided.
The project is licensed under Apache License 2.0 and emphasizes responsible use, encouraging originality and cultural sensitivity.

Hasty Briefsbeta