Needle 26M model hits local tool use

// 4h agoMODEL RELEASE

Needle 26M model hits local tool use

Needle is an ultra-compact 26M parameter model distilled from Gemini for high-speed function calling on mobile and wearables. By removing Feed-Forward Networks in favor of Simple Attention Networks, it achieves massive local performance and outperforms larger models in specialized retrieval-and-assembly tasks.

// ANALYSIS

Needle proves that massive "reasoning" models are overkill for tool orchestration, shifting the bottleneck from model size to architectural efficiency for on-device agents.

–Replaces FFNs with Simple Attention Networks, treating tool use as retrieval rather than memorization
–Delivers 6,000 tokens/sec prefill and 1,200 tokens/sec decode on standard consumer hardware
–Outperforms FunctionGemma-270M and Qwen-0.6B on function-calling accuracy at a fraction of the size
–Integrates with the Cactus engine to enable sub-150ms agentic latency for privacy-first, on-device apps
–Open-sourced under MIT license with full weights and a local fine-tuning playground available on GitHub

// TAGS

needlecactus-computesmall-llmtool-useopen-weightsedge-aiagentinferenceopen-source

DISCOVERED

4h ago

2026-05-12

PUBLISHED

6h ago

2026-05-12

RELEVANCE

9/ 10

AUTHOR

HenryNdubuaku

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE17m ago

React Doctor audits AI-generated React code

React Doctor is an automated diagnostic tool from Million.js that identifies performance bottlenecks and anti-patterns in React codebases. Specifically targeting code generated by AI agents like Cursor, it provides a "health score" and actionable feedback to maintain senior-level engineering standards.

OPEN SOURCE19m ago

OpenHuman agent launches with local memory tree

OpenHuman is an open-source, local-first AI agent that builds a persistent "Memory Tree" from your application data. It features a unique desktop mascot interface and 118+ integrations, offering a private and interactive alternative to proprietary personal AI platforms.

OPEN SOURCE23m ago

Hysteria 2.9 adds P2P NAT hole punching

The latest release of Hysteria, a high-performance censorship-resistant proxy, introduces "Realms" to enable peer-to-peer connectivity via NAT hole punching. This allows users to host servers without public IPs or port forwarding, dramatically simplifying self-hosting for users on restricted cellular or home networks.