OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoNEWS
GLM-4.7 Flash ignores story-edit prompt
A user running an uncensored Q8 build of GLM-4.7-Flash on a 5090 says the model can handle normal chats but keeps reproducing an attached 8.5k-token story nearly verbatim instead of inserting a new scene. The thread reads like a long-context failure, but it is more likely a prompt-structure and document-editing problem than a raw context-window limit.
// ANALYSIS
The hot take: 200K context does not mean 200K of reliable instruction-following. This looks less like the model “forgetting” the prompt and more like a weak edit task setup, especially when the source text comes in through PDF ingestion.
- –The model’s advertised long context can retain the story, but that does not guarantee it will prioritize a vague “rewrite with changes” instruction over faithfully reconstructing the source.
- –PDF attachments often worsen this kind of task because extraction can flatten structure, erase section boundaries, and make the model treat the input as a document to continue rather than a text to surgically modify.
- –Better results usually come from explicit edit framing: identify the insertion point, request a diff or revised scene only, and anchor the change to a concrete section or paragraph.
- –For an 8.5k-token story, chunking or retrieval is often more reliable than asking for a full end-to-end rewrite with one added scene.
- –Community uncensored builds may be strong at chat and generation, but they are not automatically optimized for precise editorial transformations.
// TAGS
glm-4.7-flashllmprompt-engineeringinferenceself-hostedopen-weights
DISCOVERED
3h ago
2026-04-16
PUBLISHED
22h ago
2026-04-16
RELEVANCE
8/ 10
AUTHOR
NeuroPalooza