Socket: malware exploits AI safety to evade scanners
Socket has identified npm malware packages designed to bypass AI-powered scanners by exploiting their safety guardrails. By inserting text references to biological or nuclear weapons into malicious code, attackers trigger safety refusals that prevent the scanner from inspecting the payload.
Attackers are turning the safety guardrails of LLMs into an evasion tool, highlighting a structural vulnerability in AI security pipelines that treat safety refusals as a blocking mechanism rather than a high-risk indicator.
* AI safety alignment (specifically guardrails against discussing WMDs) creates a novel attack vector (adversarial refusal) that bypasses automated code reviews.
* Security scanners that rely solely on first-order LLM analysis without fallback traditional static analysis are highly vulnerable to this evasion technique.
* A robust AI malware analysis pipeline must be designed to catch refusal triggers, treating any safety-induced refusal as an automatic quarantine or red flag rather than letting the code pass.
DISCOVERED
4d ago
2026-06-18
PUBLISHED
4d ago
2026-06-18
RELEVANCE
AUTHOR
SocketSecurity