SmolLM2 hits 7 tok/s on Roblox Native
Developer u/antwon_dev has successfully implemented Hugging Face’s SmolLM2-135M-Q8 running natively within Roblox’s Luau engine. The implementation achieves 7 tokens per second, enabling zero-latency, on-platform AI inference for game developers without the cost or privacy concerns of external API calls.
Running a transformer natively in Roblox’s Luau is a significant technical milestone that unlocks persistent, zero-cost AI for the platform's creator ecosystem. By bypassing costly external API calls, this implementation proves that small, high-quality models like SmolLM2 can provide real-time NPC interactions. Future optimizations like better parallelization could push performance beyond the current 7 tok/s benchmark, especially if weights are serialized directly within the game to remove GitHub dependencies. This paves the way for truly autonomous agents and complex procedural generation within the Roblox environment.
DISCOVERED
4h ago
2026-04-18
PUBLISHED
7h ago
2026-04-17
RELEVANCE
AUTHOR
antwon_dev