erm is a command-line tool that transcribes English speech with Whisper and automatically removes filler words using FFmpeg crossfades.

// 51d agoOPENSOURCE RELEASE

erm is a command-line tool that transcribes English speech with Whisper and automatically removes filler words using FFmpeg crossfades.

erm is an open-source command-line tool that transcribes audio using Whisper and splits out disfluencies like "um" and "uh" with clean FFmpeg crossfading. It uses faster-whisper for speech-to-text and performs multiple detection passes, including gap analysis, duration-based spotting, and embedded filler detection to locate filler words. To ensure the edits sound natural and seamless, the tool aligns cuts to zero-crossings, applies adaptive crossfades, and matches room tone to prevent audible clicks or abrupt shifts in background noise.

// ANALYSIS

While cloud-based editors like Descript have popularized automated filler word removal, erm offers a free, local-first CLI alternative for users who want to script their audio workflows or keep their files private.

* Local-first execution: Runs transcription and audio editing on the user's machine without external APIs.

* High-quality audio editing: Employs zero-crossing cuts, room tone matching, and adaptive crossfades to ensure smooth transitions between edits.

* Whisper-powered accuracy: Utilizes faster-whisper to detect filler words, combined with gap and duration analysis.

* Easy developer integration: Can be easily run as a CLI tool or integrated into automated media processing pipelines.

// TAGS

audiowhispercliffmpegspeech-processingopen-sourcepython

DISCOVERED

51d ago

2026-06-13

PUBLISHED

51d ago

2026-06-13

RELEVANCE

7/ 10

AUTHOR

Github Awesome

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE19m ago

LogoCreator v2 Drops Open-Source Logo Generator

LogoCreator v2 is an open-source web application designed to generate professional logos and complementary brand images within seconds. Built by developer Hassan El Mghari (Nutlope), the tool gives indie hackers, designers, and creators a free and efficient way to assemble complete visual branding for their projects.

UPDATE1h ago

Lightpanda adds Web Scheduler API across window and worker contexts

Lightpanda, an open-source headless browser built in Zig for AI agents and automated web workflows, has introduced support for the Scheduler API (scheduler.postTask) across both window and web worker contexts. This update allows web applications relying on browser-level task prioritization and scheduled execution to run seamlessly without script breakages.

UPDATE1h ago

Hermes Agent v0.20.0 drops real-time conversational voice mode

Hermes Agent v0.20.0, dubbed "The Herald Release," introduces conversational voice mode with real-time barge-in capability for fluid speech interaction. The release also adds native source citations, outbound webhook triggers, and direct agent-to-agent messaging protocols.