XBOW says GPT-5.5 lifts white-box pentests

// 90d agoBENCHMARK RESULT

XBOW says GPT-5.5 lifts white-box pentests

XBOW’s April 23, 2026 blog post argues that GPT-5.5 is a real step up for offensive security workflows, not just another incremental model bump. Using XBOW’s internal benchmark of real, previously found vulnerabilities, the company says GPT-5.5 cut miss rate to 10% from GPT-5’s 40% and Opus 4.6’s 18%. The standout claim is that GPT-5.5 already beats older models even in black-box mode, and in white-box testing it effectively overwhelms the benchmark. XBOW also says the model logs in faster, fails faster when access is blocked, and is better at deciding when to persist versus pivot.

// ANALYSIS

Hot take: this reads less like model hype and more like a concrete signal that agentic pentesting is getting operationally useful, especially when source code is available.

–XBOW’s framing matters: they measure full offensive workflows, not isolated prompt tasks.
–The biggest claim is the black-box vs. white-box gap collapsing in GPT-5.5’s favor.
–The “persist or pivot” behavior is probably the most practical improvement for real security automation.
–If these numbers hold outside XBOW’s benchmark, this could change how teams think about automated pentest coverage.

// TAGS

xbowgpt-5.5pentestingcybersecurityautonomous-agentsbenchmarkai-security

DISCOVERED

90d ago

2026-04-27

PUBLISHED

90d ago

2026-04-27

RELEVANCE

9/ 10

AUTHOR

elemental-mind

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE34m ago

OpenCode 1.18.6 fixes MCP refresh and branch caches

OpenCode version 1.18.6 introduces key stability fixes and performance improvements across its desktop application and underlying client interfaces. This update resolves provider and Model Context Protocol (MCP) refresh issues in App v1, stabilizes v2 client compatibility by pinning the UI to a versioned `@opencode-ai/client` snapshot, and isolates remote reference caches by git branch to prevent cross-branch state collisions.

OPEN SOURCE1h ago

ESP32 AI Runs 28.9M Model at 9.5 Tokens/Sec

ESP32 AI is an architectural experiment by slvDev that runs a 28.9-million-parameter TinyStories language model locally on an $8 ESP32-S3 microchip without relying on external cloud servers. By keeping a 25-million-parameter embedding table in memory-mapped SPI flash to fetch token rows on demand, the project successfully circumvents tight microcontroller RAM limitations while maintaining a generation throughput of approximately 9.5 tokens per second.

OPEN SOURCE1h ago

Open Science v0.7.2 boosts research workflow transparency

AIPOCH has released Open Science v0.7.2, an update to its open-source, model-agnostic AI workbench for scientific discovery. The new release prioritizes making AI research workflows more transparent, controllable, and easier to manage as researchers increasingly rely on autonomous agents for complex scientific tasks.