Researcher tests LLM pentesting on BookNook

// 45d agoBENCHMARK RESULT

Researcher tests LLM pentesting on BookNook

Security researcher Kasra Rahjerdi evaluated the penetration testing capabilities of 14 large language models using a deliberately vulnerable React Native app called BookNook. The experiment showed that GPT-5.5 achieved the highest success rate at 7/10 solves, while cheaper models like DeepSeek V4 Pro succeeded at a fraction of the cost and several models failed due to late-stage security refusals.

// ANALYSIS

Guardrail design in mainstream LLMs renders them ineffective for legitimate penetration testing, while unrestricted or cheaper models are becoming highly viable, cost-effective security auditing agents.

–GPT-5.5 demonstrated superior strategic focus, bypassing minor API vulnerabilities to directly exploit exposed Firebase configurations.
–High cost and late-stage security refusals (e.g., in Claude Opus and Gemini 3.5 Flash) represent major bottlenecks for developers using LLMs for authorized vulnerability scanning.
–DeepSeek V4 Pro offers an incredibly low cost per solve ($0.62) compared to Claude Sonnet 4.6 ($45.75), signaling that the economics of automated vulnerability exploitation favor smaller or open-weights providers.

// TAGS

securityllm-benchmarkingpenetration-testingfirebaseapi-securityartificial-intelligencevulnerability-exploitation

DISCOVERED

45d ago

2026-06-04

PUBLISHED

45d ago

2026-06-04

RELEVANCE

8/ 10

AUTHOR

jc4p

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1h ago

Qwen-3.8-Max Outperforms GPT-5.6 Sol, Rivals Fable 5

The shared social media announcement highlights that Alibaba's upcoming flagship model, Qwen-3.8-Max, reportedly outperforms OpenAI's GPT-5.6 Sol and trails Anthropic's Fable 5 by only a narrow margin. This benchmark performance positions Qwen-3.8-Max as a top-tier contender in the rapidly evolving frontier model landscape of 2026, challenging traditional leaders like OpenAI and Anthropic.

MODEL2h ago

IBM Granite hits Modelers with Ascend support

IBM has released a wide range of models from its Granite family—including LoRA adapters, small vision models, speech engines, and guardrails—on the Modelers platform (modelers.cn), a major Chinese open-source repository. Every model in this release is licensed under the permissive Apache-2.0 license and features native compatibility with Huawei's Ascend NPUs, significantly lowering the barrier to deploying these open-source models on domestic Chinese AI hardware.

MODEL3h ago

Kimi K3 launch strengthens open-source case

The release of Moonshot AI's Kimi K3, an open-weights model with 2.8 trillion parameters, a 1-million-token context window, and native visual processing, has sparked discussion about the viability of proprietary frontier LLM training. As open-weights models achieve performance parity with proprietary systems on key coding and agentic benchmarks, developers and investors are increasingly questioning the massive capital requirements of closed-source frontier projects in favor of more cost-effective open alternatives.