easyaligner ships GPU alignment, text normalization
easyaligner is an open-source forced-alignment library for speech-text workflows, built to handle messy real-world transcripts with GPU acceleration and reversible text normalization. It targets long audio, partial transcript coverage, and Hugging Face Wav2Vec2 models without requiring manual chunking.
This is the kind of infrastructure release that matters more than a flashy demo: it focuses on the pain points people hit when aligning large speech datasets in production.
- –GPU Viterbi alignment keeps long-form audio feasible in one pass, which is the real bottleneck for large preprocessing jobs
- –Reversible normalization is a strong differentiator because it preserves original formatting instead of forcing a lossy preprocessing step
- –Automatic handling of missing transcript coverage and extra leading/trailing speech makes it more practical than many “clean data only” aligners
- –Compatibility with essentially any HF Hub Wav2Vec2 CTC model broadens the usable language/model surface area
- –The companion `easytranscriber` angle is a good sign this is meant as a pipeline primitive, not a one-off toolkit
DISCOVERED
45d ago
2026-04-18
PUBLISHED
45d ago
2026-04-18
RELEVANCE
AUTHOR
mLalush