BACK_TO_FEEDAICRIER_2
llama.cpp 1M-token brute force falters
OPEN_SOURCE ↗
REDDIT · REDDIT// 20d agoINFRASTRUCTURE

llama.cpp 1M-token brute force falters

A LocalLLaMA user tried dumping an 800K-token org-mode archive and a 100K-message maildir into several 1M-context local models through llama.cpp. The models could ingest the data, but factual recall was brittle, with similar notes getting conflated and answers drifting into hallucination.

// ANALYSIS

This is a retrieval problem, not a raw context problem. `--temp` can change how random or conservative the answer feels, but it will not fix the model's inability to reliably surface the right evidence from a huge, noisy prompt.

  • Long-context models still suffer from lost-in-the-middle and positional bias, so facts buried deep in the prompt are easy to miss.
  • Semi-structured org-mode data is a bad fit for brute-force prompting because hierarchy, links, and near-duplicate events need explicit indexing.
  • Maildir is better treated as a searchable corpus with metadata, threading, and citations, not as a text blob for attention to sort out on its own.
  • A hybrid pipeline with heading-based chunking, keyword search, embeddings, reranking, and short-context synthesis will be much more reliable.
  • If the goal is transparency and operability, add search and retrieval tools around the archive instead of trying to make the model remember everything.
// TAGS
llama-cppllmragsearchdata-toolsself-hostedinference

DISCOVERED

20d ago

2026-03-22

PUBLISHED

20d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

phwlarxoc