OPEN_SOURCE ↗
REDDIT · REDDIT// 27d agoINFRASTRUCTURE
Old Titan X Pascal hits 25 tok/s
A developer dusted off an NVIDIA Titan X Pascal GPU and added it to a local server to run LLMs, achieving ~500 tokens/sec prompt processing and 25 tokens/sec generation using llama.cpp and OpenCode — roughly matching a modern AMD 9070 XT at half the generation speed.
// ANALYSIS
Old datacenter-grade Pascal hardware still punches above its weight for local inference, which matters as more developers look to repurpose aging GPUs rather than buy new.
- –25 tok/s generation on decade-old hardware is genuinely usable for background coding agents or overnight batch tasks
- –llama.cpp's CPU+GPU offloading means even cards with limited VRAM can contribute meaningfully to inference throughput
- –The 5x speedup over CPU-only (6 tok/s → 25 tok/s) shows the GPU floor is low but real
- –This is a practical benchmark for the "basement server" crowd increasingly running local AI pipelines
// TAGS
llminferencegpuopen-sourceself-hosted
DISCOVERED
27d ago
2026-03-16
PUBLISHED
27d ago
2026-03-16
RELEVANCE
5/ 10
AUTHOR
Lazy-Routine-Handler