BACK_TO_FEEDAICRIER_2
GLM-4.7 Flash ignores story-edit prompt
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoNEWS

GLM-4.7 Flash ignores story-edit prompt

A user running an uncensored Q8 build of GLM-4.7-Flash on a 5090 says the model can handle normal chats but keeps reproducing an attached 8.5k-token story nearly verbatim instead of inserting a new scene. The thread reads like a long-context failure, but it is more likely a prompt-structure and document-editing problem than a raw context-window limit.

// ANALYSIS

The hot take: 200K context does not mean 200K of reliable instruction-following. This looks less like the model “forgetting” the prompt and more like a weak edit task setup, especially when the source text comes in through PDF ingestion.

  • The model’s advertised long context can retain the story, but that does not guarantee it will prioritize a vague “rewrite with changes” instruction over faithfully reconstructing the source.
  • PDF attachments often worsen this kind of task because extraction can flatten structure, erase section boundaries, and make the model treat the input as a document to continue rather than a text to surgically modify.
  • Better results usually come from explicit edit framing: identify the insertion point, request a diff or revised scene only, and anchor the change to a concrete section or paragraph.
  • For an 8.5k-token story, chunking or retrieval is often more reliable than asking for a full end-to-end rewrite with one added scene.
  • Community uncensored builds may be strong at chat and generation, but they are not automatically optimized for precise editorial transformations.
// TAGS
glm-4.7-flashllmprompt-engineeringinferenceself-hostedopen-weights

DISCOVERED

3h ago

2026-04-16

PUBLISHED

22h ago

2026-04-16

RELEVANCE

8/ 10

AUTHOR

NeuroPalooza