Gemma 4 fine-tuning hits multimodal roadblocks
Google's Gemma 4 introduces architectural shifts that break standard fine-tuning tools like PEFT and DeepSpeed. Oxen.ai's detailed post-mortem reveals the manual workarounds needed for LoRA adaptation and deployment in the current ecosystem.
Gemma 4's custom linear layers and shared KV-cache architecture demonstrate that standard LLM tooling is struggling to keep pace with multimodal innovations. The new ClippableLinear modules require manual unwrapping to work with PEFT, while silent training failures in SFTTrainer and adapter-saving bugs in DeepSpeed ZeRO-3 necessitate specific library versions or alternative distribution strategies. Furthermore, the current lack of runtime LoRA support in major inference engines forces a complex merge-then-remap pipeline for deployment.
DISCOVERED
3h ago
2026-04-19
PUBLISHED
5h ago
2026-04-18
RELEVANCE
AUTHOR
FallMindless3563