GPT-5.4 regresses on prompt injection
OpenAI's GPT-5.4 safety materials show improved resistance to prompt injection through email connectors but a slight regression on attacks targeting function cells. Third-party testing from AgentSeal also ranks GPT-5.4 near the top overall, yet gives it only 50% injection resistance, reinforcing that prompt injection remains a live weakness in agent-style workflows.
The story here is not that GPT-5.4 is uniquely unsafe — it's that frontier models still crack where tools, connectors, and external content can smuggle instructions back into the loop. Better reasoning helps, but it does not solve agent security by itself.
- –OpenAI's own system card explicitly says GPT-5.4 improved on email-connector prompt injection and regressed slightly on function-cell attacks
- –AgentSeal ranks GPT-5.4 second overall with an 87.5 trust score, but its injection-resistance slice is only 50%, far below its perfect extraction and boundary scores
- –Agent builders should treat tool output, MCP responses, spreadsheets, and retrieved content as untrusted input rather than model-readable truth
- –The practical takeaway for developers is still the same: sanitize tool outputs, scope permissions tightly, and require confirmation before high-impact actions
DISCOVERED
77d ago
2026-03-10
PUBLISHED
81d ago
2026-03-07
RELEVANCE
AUTHOR
MeetReady6307

