Hasty Briefsbeta

Run Whisper audio transcriptions with one FFmpeg command

12 days ago
  • #FFmpeg
  • #Whisper.cpp
  • #audio-transcription
  • Whisper.cpp is a high-performance automatic speech recognition library using OpenAI’s Whisper model.
  • Integration with FFmpeg allows for simple audio transcription pipelines with a single shell command.
  • Installation involves cloning the whisper.cpp repository, downloading a model, and building the library.
  • GPU support can be enabled for faster processing by installing Nvidia CUDA toolkit and configuring whisper.cpp accordingly.
  • FFmpeg must be built with whisper.cpp support to use the whisper audio filter.
  • The whisper filter in FFmpeg can be configured with parameters like model path, language, GPU usage, and output format.
  • Transcriptions can be output in various formats including text, SRT, and JSON.
  • Live audio streams can be transcribed in real-time, with options to send output to external services via HTTP.
  • Voice Activity Detection (VAD) can be used to improve transcription accuracy by splitting audio into chunks based on speech activity.
  • Microphone audio can be transcribed live using VAD to handle pauses and speech segments effectively.