Local RAG Jarvis hits Wikipedia ingest wall
A developer building a personal Jarvis-style RAG chatbot on an AMD RX 7900 XT is facing a 3-year timeline for a full English Wikipedia ingest. The project focuses on verifiable, sourceable information to combat misinformation, but is bottlenecked by a high-overhead extraction pipeline that builds complex metadata layers—including claims, entities, and provenance—far beyond simple vector embeddings.
This project is a reality check for "local-first" AI enthusiasts aiming for high-fidelity knowledge graphs on consumer hardware. The user's discovery that extraction logic is an order of magnitude more expensive than simple embedding underscores the "hidden tax" of verifiable RAG.
- –Leveraging the brand-new Qwen 3.6 (released April 2026) for bulk extraction shows the project is at the bleeding edge of agentic AI integration.
- –The throughput bottleneck isn't the 4B embedder, but the logic required for "claims, entities, and contextual headers" which effectively turns a retrieval task into a massive inference job.
- –Using IQ3_XXS quants for bulk extraction is a pragmatic choice to maximize the 7900 XT's 20GB VRAM, yet still hits a wall at 13.5k chunks/hr.
- –The lack of multi-GPU tensor-split support in the current ROCm/Vulkan stack remains a primary architectural hurdle for local scaling on AMD hardware.
DISCOVERED
4h ago
2026-04-19
PUBLISHED
5h ago
2026-04-19
RELEVANCE
AUTHOR
vick2djax