BACK_TO_FEEDAICRIER_2
Local RAG Jarvis hits Wikipedia ingest wall
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoNEWS

Local RAG Jarvis hits Wikipedia ingest wall

A developer building a personal Jarvis-style RAG chatbot on an AMD RX 7900 XT is facing a 3-year timeline for a full English Wikipedia ingest. The project focuses on verifiable, sourceable information to combat misinformation, but is bottlenecked by a high-overhead extraction pipeline that builds complex metadata layers—including claims, entities, and provenance—far beyond simple vector embeddings.

// ANALYSIS

This project is a reality check for "local-first" AI enthusiasts aiming for high-fidelity knowledge graphs on consumer hardware. The user's discovery that extraction logic is an order of magnitude more expensive than simple embedding underscores the "hidden tax" of verifiable RAG.

  • Leveraging the brand-new Qwen 3.6 (released April 2026) for bulk extraction shows the project is at the bleeding edge of agentic AI integration.
  • The throughput bottleneck isn't the 4B embedder, but the logic required for "claims, entities, and contextual headers" which effectively turns a retrieval task into a massive inference job.
  • Using IQ3_XXS quants for bulk extraction is a pragmatic choice to maximize the 7900 XT's 20GB VRAM, yet still hits a wall at 13.5k chunks/hr.
  • The lack of multi-GPU tensor-split support in the current ROCm/Vulkan stack remains a primary architectural hurdle for local scaling on AMD hardware.
// TAGS
llmraggpuself-hostedinferenceqwen-3-6personal-rag-jarvis

DISCOVERED

4h ago

2026-04-19

PUBLISHED

5h ago

2026-04-19

RELEVANCE

8/ 10

AUTHOR

vick2djax