MII-LLM releases Zagreus, Nesso small models

// 90d agoMODEL RELEASE

MII-LLM releases Zagreus, Nesso small models

MII-LLM’s report details how it trained a family of 0.4B bilingual LLMs from scratch for Italian, Spanish, French, and Portuguese. The release includes four base Zagreus checkpoints, three Nesso post-trained variants, and a fully open recipe built around edge deployment.

// ANALYSIS

This is a strong example of small-model engineering done seriously: the value is not just the weights, but the full reproducible pipeline from tokenization to Slurm orchestration to post-training. In the sub-1B regime, disciplined data and training choices matter more than architectural novelty.

–Dense 0.4B is the sensible call here; MoE complexity is hard to justify when stability and hardware utilization are the bottlenecks.
–The bilingual English + target-language setup is a practical way to cover European languages without pretending a tiny model can be universal.
–Nesso-agentic is likely the most useful checkpoint in the set because structured output and function calling are where small models can still feel “product-ready.”
–The benchmark story is encouraging, but the real ceiling remains visible: arithmetic, factual recall, and repetition are still weak points.
–The open variant is the most interesting piece for the ecosystem, because reproducible small-model training is still rare and highly transferable.

// TAGS

zagreusnessomii-llmllmedge-aiopen-sourcebenchmarkmlops

DISCOVERED

90d ago

2026-04-17

PUBLISHED

90d ago

2026-04-17

RELEVANCE

9/ 10

AUTHOR

kazzus78

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH25m ago

Fly.io launches Sprites, providing stateful and hardware-isolated Linux sandbox environments with fast copy-on-write checkpoint and restore capabilities.

Fly.io has introduced Sprites, which are stateful sandbox environments running in hardware-isolated AWS Firecracker microVMs designed for executing arbitrary, untrusted code or AI agents. Unlike traditional ephemeral serverless functions, Sprites retain their disk state between runs, utilizing a fast NVMe filesystem that continuously syncs to durable external storage. The platform features an ultra-fast copy-on-write checkpoint and restore system taking about 300ms, granular network egress policies using simple domain-level allowlists, and custom port forwarding for public or private service access. Sprites scale to zero and burst dynamically, meaning developers only pay for actual CPU, memory, and written storage usage.

UPDATE1h ago

Inkling model hits Claude Code via Hugging Face

Thinking Machines has made its new 975-billion parameter multimodal Mixture-of-Experts model, Inkling, accessible within Claude Code. This integration is powered by Claude Code's support for Hugging Face inference providers, allowing developers to leverage the new open-weights model for their daily programming workflows.

UPDATE2h ago

Kimi Code CLI integrates Kimi K3

Kimi Code is a terminal-based AI developer CLI and agent environment from Moonshot AI that now supports their Kimi K3 flagship model. Operating locally, the tool functions as an agentic coding assistant that allows developers to run command execution, file editing, and debugging tasks without leaving the terminal, positioning itself as a competitor to terminal-native platforms like Claude Code.