GitHub - jamiepine/voicebox: The open-source voice synthesis studio
5 hours ago
- #local-first
- #voice-synthesis
- #open-source
- Voicebox is an open-source, local-first voice cloning studio that operates entirely on your machine, offering an alternative to services like ElevenLabs.
- It supports cloning voices from short audio samples, generating speech in 23 languages using 5 TTS engines, and applying post-processing audio effects.
- Features include complete privacy (data stays local), expressive speech tags (e.g., [laugh], [sigh]), unlimited text length with auto-chunking, a multi-voice timeline editor for stories, and a REST API for integration.
- Available for macOS, Windows, Linux (with Docker), and supports various hardware backends (MLX, CUDA, ROCm, DirectML, CPU).
- Includes 8 audio effects (pitch shift, reverb, delay, etc.), generation versioning with provenance tracking, non-blocking generation queues, voice profile management, and in-app recording/transcription.
- Built with Tauri (Rust), React, FastAPI, and uses models like Qwen3-TTS, LuxTTS, Chatterbox, and TADA.
- Future plans include real-time streaming, voice design from text, more models, plugin architecture, and a mobile companion app.
- Open for contributions via GitHub with detailed setup instructions (using 'just' for commands) and an MIT license.