OPEN_SOURCE ↗
REDDIT · REDDIT// 12d agoNEWS
Ollama copies GGUF files, lacks in-place mode
A Reddit user wants to benchmark Ollama's tok/s and TTFT against a llama.cpp server without paying for a second GGUF copy. Ollama's docs show GGUF import via Modelfile and ollama create, and an open GitHub issue confirms that path currently makes a regular copy into Ollama's storage.
// ANALYSIS
Ollama's simplicity is a feature until you care about storage semantics. For local benchmarking, the missing zero-copy path is a real papercut because it adds disk overhead without changing the model itself.
- –The official import docs support `FROM /path/to/file.gguf` plus `ollama create`, but they do not describe a first-class in-place serving mode.
- –An open GitHub issue says `ollama create` currently does a regular copy of the `.gguf` file and asks for copy-on-write or reflink behavior instead.
- –For teams comparing Ollama with llama.cpp, that means duplicate model storage becomes part of the workflow even if runtime throughput is the only metric under test.
- –The issue points to APFS, Btrfs, and ZFS as the kinds of filesystems where reflinks could help, which shows the current copy behavior is an implementation choice, not a GGUF limitation.
// TAGS
ollamallminferencebenchmarkself-hostedcli
DISCOVERED
12d ago
2026-03-30
PUBLISHED
12d ago
2026-03-30
RELEVANCE
7/ 10
AUTHOR
Adorable_Weakness_39