Run Whisper audio transcriptions with one FFmpeg command
13 days ago
- #FFmpeg
- #Whisper.cpp
- #audio-transcription
- Whisper.cpp is a high-performance automatic speech recognition library using OpenAI’s Whisper model.
- Integration with FFmpeg allows for simple audio transcription pipelines with a single shell command.
- Installation involves cloning the whisper.cpp repository, downloading a model, and building the library.
- GPU support can be enabled for faster processing by installing Nvidia CUDA toolkit and configuring whisper.cpp accordingly.
- FFmpeg must be built with whisper.cpp support to use the whisper audio filter.
- The whisper filter in FFmpeg can be configured with parameters like model path, language, GPU usage, and output format.
- Transcriptions can be output in various formats including text, SRT, and JSON.
- Live audio streams can be transcribed in real-time, with options to send output to external services via HTTP.
- Voice Activity Detection (VAD) can be used to improve transcription accuracy by splitting audio into chunks based on speech activity.
- Microphone audio can be transcribed live using VAD to handle pauses and speech segments effectively.