erm is a command-line tool that transcribes English speech with Whisper and automatically removes filler words using FFmpeg crossfades.
erm is an open-source command-line tool that transcribes audio using Whisper and splits out disfluencies like "um" and "uh" with clean FFmpeg crossfading. It uses faster-whisper for speech-to-text and performs multiple detection passes, including gap analysis, duration-based spotting, and embedded filler detection to locate filler words. To ensure the edits sound natural and seamless, the tool aligns cuts to zero-crossings, applies adaptive crossfades, and matches room tone to prevent audible clicks or abrupt shifts in background noise.
While cloud-based editors like Descript have popularized automated filler word removal, erm offers a free, local-first CLI alternative for users who want to script their audio workflows or keep their files private.
* Local-first execution: Runs transcription and audio editing on the user's machine without external APIs.
* High-quality audio editing: Employs zero-crossing cuts, room tone matching, and adaptive crossfades to ensure smooth transitions between edits.
* Whisper-powered accuracy: Utilizes faster-whisper to detect filler words, combined with gap and duration analysis.
* Easy developer integration: Can be easily run as a CLI tool or integrated into automated media processing pipelines.
DISCOVERED
6d ago
2026-06-13
PUBLISHED
6d ago
2026-06-13
RELEVANCE
AUTHOR
Github Awesome