OPEN_SOURCE ↗
REDDIT · REDDIT// 12d agoOPENSOURCE RELEASE
YouTube RAG Scraper turns channels into RAG datasets
A developer built youtube-rag-scraper, a CLI that pulls channel videos, extracts transcripts, and cleans them into embeddings-ready chunks. It started as the data layer for a coffee coaching app and ended up being the part that got the most attention.
// ANALYSIS
The app is the demo; the data pipeline is the product. For niche RAG, the hard part is not the model, it’s turning noisy creator video into something stable enough to retrieve.
- –The repo covers the annoying middle layer most demos skip: channel, playlist, and video scraping, transcript extraction, cleanup, chunking, rate-limit handling, resume support, and export.
- –Sentence-aware chunking plus FAISS makes it immediately useful as an offline knowledge base or semantic search index.
- –A CLI is the right shape here because it stays composable for teams that already have their own embedding stack or vector DB.
- –Coffee education content is a strong source for this kind of assistant: it’s dense, expert-driven, and much richer than most written guides.
- –The post is also a reminder that useful AI products often start as internal data plumbing before they become a standalone app.
// TAGS
youtube-rag-scraperragclidata-toolsembeddingsearchopen-source
DISCOVERED
12d ago
2026-03-30
PUBLISHED
12d ago
2026-03-30
RELEVANCE
8/ 10
AUTHOR
ravann4