BACK_TO_FEEDAICRIER_2
Old Titan X Pascal hits 25 tok/s
OPEN_SOURCE ↗
REDDIT · REDDIT// 27d agoINFRASTRUCTURE

Old Titan X Pascal hits 25 tok/s

A developer dusted off an NVIDIA Titan X Pascal GPU and added it to a local server to run LLMs, achieving ~500 tokens/sec prompt processing and 25 tokens/sec generation using llama.cpp and OpenCode — roughly matching a modern AMD 9070 XT at half the generation speed.

// ANALYSIS

Old datacenter-grade Pascal hardware still punches above its weight for local inference, which matters as more developers look to repurpose aging GPUs rather than buy new.

  • 25 tok/s generation on decade-old hardware is genuinely usable for background coding agents or overnight batch tasks
  • llama.cpp's CPU+GPU offloading means even cards with limited VRAM can contribute meaningfully to inference throughput
  • The 5x speedup over CPU-only (6 tok/s → 25 tok/s) shows the GPU floor is low but real
  • This is a practical benchmark for the "basement server" crowd increasingly running local AI pipelines
// TAGS
llminferencegpuopen-sourceself-hosted

DISCOVERED

27d ago

2026-03-16

PUBLISHED

27d ago

2026-03-16

RELEVANCE

5/ 10

AUTHOR

Lazy-Routine-Handler