GPT-OSS-20B beats Qwen3.6 in coding

// 90d agoBENCHMARK RESULT

GPT-OSS-20B beats Qwen3.6 in coding

A Reddit user compares GPT-OSS-20B and Qwen3.6-35B-A3B on TypeScript and Rust prompts and says Claude Sonnet 4.6 rated the OpenAI model higher. The thread asks whether that reflects real quality, prompt sensitivity, or judge bias.

// ANALYSIS

This reads less like a definitive model ranking and more like a noisy local eval where output style, sampling, and the judge model all matter. GPT-OSS-20B is not ancient either: OpenAI introduced it in August 2025 and says it was trained with a coding- and STEM-heavy focus.

–OpenAI positions gpt-oss-20b as a local-friendly open-weight model optimized for reasoning, tool use, and coding, with 3.6B active parameters and 16 GB memory targets.
–Qwen3.6-35B-A3B is also a sparse MoE model aimed at agentic coding, so the gap is more about tuning, prompting, and inference settings than raw parameter count.
–LLM judges tend to reward clean structure, obvious type safety, and compile-looking code; that can favor one model’s writing style over another’s true correctness.
–Repeated trials plus “pick the best score” selection makes the comparison shakier, because it amplifies variance instead of measuring central tendency.
–The useful lesson is that code evals should include compilation, runtime tests, and many runs; single-judge subjective ratings are a weak proxy for actual coding ability.

// TAGS

gpt-oss-20bqwenllmai-codingreasoningbenchmarkopen-source

DISCOVERED

90d ago

2026-04-19

PUBLISHED

90d ago

2026-04-19

RELEVANCE

9/ 10

AUTHOR

kaisellgren

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL17m ago

NVIDIA open-sources Nemotron 3 Embed

NVIDIA has released Nemotron 3 Embed, a family of high-performance text embedding models designed to power semantic search and Retrieval-Augmented Generation (RAG) workflows. The release features a flagship 8B parameter model ranking highly on embedding benchmarks alongside an efficient 1B variant optimized for resource-constrained environments.

MODEL17m ago

NVIDIA launches Ardy real-time motion model

NVIDIA's Spatial Intelligence Lab has developed Ardy, an autoregressive diffusion model for real-time, interactive 3D human motion generation. The model supports online text prompting and flexible kinematic constraints at inference time without requiring retraining, making it suitable for animation, gaming, and robotics.

POLICY2h ago

US weighs FINRA-style AI safety regulator

The Trump administration is evaluating a proposal to create an independent, industry-funded AI safety regulator modeled after the Financial Industry Regulatory Authority (FINRA). The self-regulatory watchdog would establish standardized deployment guidelines for frontier models under SEC oversight to replace ad-hoc federal interventions.