REDDIT · REDDIT// 6h agoINFRASTRUCTURE

Step 3.5 Flash Undercuts Qwen Serve Costs

ANNOUNCEMENT PRODUCT GITHUB PRODUCT HUNT

This Reddit thread compares why StepFun's Step 3.5 Flash is cheaper on DeepInfra than Qwen3.6-35B-A3B. The key factors are active compute, memory pressure, context length, quantization, throughput, and provider economics rather than total parameter count alone.

// ANALYSIS

Hot take: total parameters are a bad proxy for API price; the bill is mostly about what has to stay hot per token and how efficiently the provider can run it.

–Step 3.5 Flash is a sparse MoE model with 196B total parameters but only 11B active per token, and StepFun describes it as designed around inference cost and speed.
–Qwen3.6-35B-A3B has 35B total parameters and 3B activated per token, but its serving stack includes a 256-expert MoE design, a vision encoder, and a long 256K native context, which affects deployment economics.
–DeepInfra’s listed pricing is $0.10 input / $0.30 output per 1M tokens for Step 3.5 Flash versus $0.19 input / $1.00 output for Qwen3.6-35B-A3B, so the provider is clearly pricing for more than parameter count alone.
–The output-token gap is the bigger tell: vendors usually charge more where decode-time throughput, long-context KV cache pressure, and demand are harsher.
–In plain English, “4x bigger” by headline size does not mean 4x more expensive to serve; sparse activation can flip that intuition.

// TAGS

mixture-of-expertsinferencepricingapistepfunqwenllm-economicsdeepinfra

DISCOVERED

6h ago

2026-05-01

PUBLISHED

8h ago

2026-04-30

RELEVANCE

7/ 10

AUTHOR

urarthur