Qwen3.6 heretic v2 keeps MTP, slashes refusals
LLMFan46 released a Qwen3.6-27B uncensored fork that preserves all 15 native MTP heads while shipping safetensors, GGUF, and NVFP4 builds. The repo claims 0.0021 KL divergence and 6/100 refusals, making this a local-serving variant aimed at keeping Qwen3.6’s speed and flexibility intact.
The interesting part here is not the “uncensored” label, it’s that the fork tries to keep speculative decoding and quantization-friendly deployment at the same time. That is the difference between a novelty model and something people will actually run.
- –The release says all 15 MTPs are retained, which matters for throughput on local inference stacks that can exploit speculative decoding
- –Multiple packaging formats widen the audience: safetensors for standard serving, GGUF for llama.cpp-style local use, and NVFP4 for smaller-footprint GPU deployment
- –The reported 0.0021 KL divergence and 6/100 refusal rate are attractive, but they are self-reported model-card metrics, not independent evals
- –This is a derivative of Qwen3.6-27B, so the underlying appeal is still Qwen’s strong multimodal and agentic-coding base rather than a brand-new model family
- –For AI devs, the practical question is whether preserved MTP outweighs the usual quality hit from aggressive post-processing; that is the real tradeoff here
DISCOVERED
2h ago
2026-05-07
PUBLISHED
3h ago
2026-05-07
RELEVANCE
AUTHOR
LLMFan46