GitHub - jamiepine/voicebox: The open-source voice synthesis studio

a month ago

Voicebox is an open-source, local-first voice cloning studio that operates entirely on your machine, offering an alternative to services like ElevenLabs.
It supports cloning voices from short audio samples, generating speech in 23 languages using 5 TTS engines, and applying post-processing audio effects.
Features include complete privacy (data stays local), expressive speech tags (e.g., [laugh], [sigh]), unlimited text length with auto-chunking, a multi-voice timeline editor for stories, and a REST API for integration.
Available for macOS, Windows, Linux (with Docker), and supports various hardware backends (MLX, CUDA, ROCm, DirectML, CPU).
Includes 8 audio effects (pitch shift, reverb, delay, etc.), generation versioning with provenance tracking, non-blocking generation queues, voice profile management, and in-app recording/transcription.
Built with Tauri (Rust), React, FastAPI, and uses models like Qwen3-TTS, LuxTTS, Chatterbox, and TADA.
Future plans include real-time streaming, voice design from text, more models, plugin architecture, and a mobile companion app.
Open for contributions via GitHub with detailed setup instructions (using 'just' for commands) and an MIT license.

Hasty Briefsbeta