BACK_TO_FEEDAICRIER_2
YouTube RAG Scraper turns channels into RAG datasets
OPEN_SOURCE ↗
REDDIT · REDDIT// 12d agoOPENSOURCE RELEASE

YouTube RAG Scraper turns channels into RAG datasets

A developer built youtube-rag-scraper, a CLI that pulls channel videos, extracts transcripts, and cleans them into embeddings-ready chunks. It started as the data layer for a coffee coaching app and ended up being the part that got the most attention.

// ANALYSIS

The app is the demo; the data pipeline is the product. For niche RAG, the hard part is not the model, it’s turning noisy creator video into something stable enough to retrieve.

  • The repo covers the annoying middle layer most demos skip: channel, playlist, and video scraping, transcript extraction, cleanup, chunking, rate-limit handling, resume support, and export.
  • Sentence-aware chunking plus FAISS makes it immediately useful as an offline knowledge base or semantic search index.
  • A CLI is the right shape here because it stays composable for teams that already have their own embedding stack or vector DB.
  • Coffee education content is a strong source for this kind of assistant: it’s dense, expert-driven, and much richer than most written guides.
  • The post is also a reminder that useful AI products often start as internal data plumbing before they become a standalone app.
// TAGS
youtube-rag-scraperragclidata-toolsembeddingsearchopen-source

DISCOVERED

12d ago

2026-03-30

PUBLISHED

12d ago

2026-03-30

RELEVANCE

8/ 10

AUTHOR

ravann4