BACK_TO_FEEDAICRIER_2
Rorschach tests expose LLM contamination limits
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoRESEARCH PAPER

Rorschach tests expose LLM contamination limits

The paper tests GPT-4o, Grok-3, and Gemini 2.0 Flash Thinking on the 10 standard Rorschach cards, then codes their outputs with Exner-style metrics and an AI self-analysis step. As an exploratory demo, it is interesting; as evidence about genuine perception, it is methodologically thin because the stimuli, scoring logic, and many “expected” interpretations are almost certainly part of the models’ training exposure.

// ANALYSIS

Hot take: this is much better at revealing how multimodal models reproduce culturally saturated response patterns than at revealing anything clean about visual cognition.

  • The strongest objection is contamination: the standard cards, manuals, and common interpretations are public and likely embedded in training data, so the task is partly retrieval and pattern completion, not fresh perception.
  • The setup is under-controlled for a serious inference: public web interfaces, default vendor behavior, and apparently single-pass administration make the outputs hard to interpret causally.
  • Even so, the paper has limited exploratory value: it shows that LLMs can generate coherent, human-like narratives around ambiguous images and can be scored by psychometric frameworks, which is itself a useful stress test.
  • The scientifically stronger version would use novel or synthetic ambiguous stimuli, multiple repeated trials, API-level decoding control, preregistered coding, and baselines that separate image understanding from memorized psychometric language.
  • What it mostly demonstrates is advanced statistical pattern matching plus learned psychometric priors, not evidence that the models have an “inner world” in any human sense.
  • Papers like this get through peer review when they are framed as pilot or hypothesis-generating work, but the interpretation often outruns the evidential strength of the design.
// TAGS
llmrorschachpsychologymultimodal-aitraining-data-contaminationprojective-testspeer-reviewai-safety

DISCOVERED

3h ago

2026-04-28

PUBLISHED

3h ago

2026-04-28

RELEVANCE

7/ 10

AUTHOR

Impossible_Echo4029