Run Whisper audio transcriptions with one FFmpeg command

13 days ago

Copy Link

Whisper.cpp is a high-performance automatic speech recognition library using OpenAI’s Whisper model.
Integration with FFmpeg allows for simple audio transcription pipelines with a single shell command.
Installation involves cloning the whisper.cpp repository, downloading a model, and building the library.
GPU support can be enabled for faster processing by installing Nvidia CUDA toolkit and configuring whisper.cpp accordingly.
FFmpeg must be built with whisper.cpp support to use the whisper audio filter.
The whisper filter in FFmpeg can be configured with parameters like model path, language, GPU usage, and output format.
Transcriptions can be output in various formats including text, SRT, and JSON.
Live audio streams can be transcribed in real-time, with options to send output to external services via HTTP.
Voice Activity Detection (VAD) can be used to improve transcription accuracy by splitting audio into chunks based on speech activity.
Microphone audio can be transcribed live using VAD to handle pauses and speech segments effectively.

Hasty Briefsbeta