YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Step 3.5 Flash Undercuts Qwen Serve Costs

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Step 3.5 Flash Undercuts Qwen Serve Costs
OPEN LINK ↗
// 47d agoINFRASTRUCTURE

Step 3.5 Flash Undercuts Qwen Serve Costs

This Reddit thread compares why StepFun's Step 3.5 Flash is cheaper on DeepInfra than Qwen3.6-35B-A3B. The key factors are active compute, memory pressure, context length, quantization, throughput, and provider economics rather than total parameter count alone.

// ANALYSIS

Hot take: total parameters are a bad proxy for API price; the bill is mostly about what has to stay hot per token and how efficiently the provider can run it.

  • Step 3.5 Flash is a sparse MoE model with 196B total parameters but only 11B active per token, and StepFun describes it as designed around inference cost and speed.
  • Qwen3.6-35B-A3B has 35B total parameters and 3B activated per token, but its serving stack includes a 256-expert MoE design, a vision encoder, and a long 256K native context, which affects deployment economics.
  • DeepInfra’s listed pricing is $0.10 input / $0.30 output per 1M tokens for Step 3.5 Flash versus $0.19 input / $1.00 output for Qwen3.6-35B-A3B, so the provider is clearly pricing for more than parameter count alone.
  • The output-token gap is the bigger tell: vendors usually charge more where decode-time throughput, long-context KV cache pressure, and demand are harsher.
  • In plain English, “4x bigger” by headline size does not mean 4x more expensive to serve; sparse activation can flip that intuition.
// TAGS
mixture-of-expertsinferencepricingapistepfunqwenllm-economicsdeepinfra

DISCOVERED

47d ago

2026-05-01

PUBLISHED

47d ago

2026-04-30

RELEVANCE

7/ 10

AUTHOR

urarthur