llama.cpp DeepSeek DSA seeks 768GB VRAM

// 81d agoINFRASTRUCTURE

llama.cpp DeepSeek DSA seeks 768GB VRAM

The author is looking for access to a very large multi-GPU machine to benchmark a proof-of-concept DeepSeek Sparse Attention branch of llama.cpp. The goal is to verify dense-vs-sparse behavior on DeepSeek V3.2 Speciale with lineage-bench, since the differences only show up on harder reasoning tasks.

// ANALYSIS

This is less a launch than a correctness hunt for an inference kernel that only really proves itself under brutal benchmark conditions.

–The 768 GB VRAM ask tells you this is well past normal workstation territory and into shared-cluster or proxy-runner territory.
–lineage-bench is a smart choice here because it stresses reasoning behavior, not just token throughput.
–The failed 8x RTX PRO 6000 run suggests the bottleneck is memory layout for indexer tensors, not just raw compute.
–Comparing against prior sglang fp8 runs gives a useful cross-framework sanity check if the sparse-attention patch is doing the right thing.
–If the sparse branch matches expected quality deltas, it would be a strong signal that llama.cpp can support DeepSeek V3.2 Speciale without silently flattening its behavior.

// TAGS

llmbenchmarktestinggpuinferencedeepseekllama-cpp

DISCOVERED

81d ago

2026-03-20

PUBLISHED

81d ago

2026-03-20

RELEVANCE

8/ 10

AUTHOR

fairydreaming

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL17m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.

MODEL1h ago

Claude Fable 5 hits Google Cloud

Anthropic's new Mythos-class frontier AI model, Claude Fable 5, is now generally available on Google Cloud's Agent Platform (Vertex AI). Designed for complex, long-horizon reasoning and autonomous workflows, Fable 5 is built for tasks such as software engineering, deep research, and multi-day agentic execution, featuring built-in safety guardrails that automatically redirect sensitive queries to Claude Opus 4.8.

UPDATE1h ago

B.AI integrates Claude Fable 5 into developer API

Developer platform B.AI has integrated Anthropic's Claude Fable 5 model into its API ecosystem. Developers can now utilize Claude Fable 5's advanced reasoning and code generation capabilities within B.AI's unified, OpenAI-compatible API framework, which simplifies model access, agent identity management, and transaction payments.