BACK_TO_FEEDAICRIER_2
Ollama copies GGUF files, lacks in-place mode
OPEN_SOURCE ↗
REDDIT · REDDIT// 12d agoNEWS

Ollama copies GGUF files, lacks in-place mode

A Reddit user wants to benchmark Ollama's tok/s and TTFT against a llama.cpp server without paying for a second GGUF copy. Ollama's docs show GGUF import via Modelfile and ollama create, and an open GitHub issue confirms that path currently makes a regular copy into Ollama's storage.

// ANALYSIS

Ollama's simplicity is a feature until you care about storage semantics. For local benchmarking, the missing zero-copy path is a real papercut because it adds disk overhead without changing the model itself.

  • The official import docs support `FROM /path/to/file.gguf` plus `ollama create`, but they do not describe a first-class in-place serving mode.
  • An open GitHub issue says `ollama create` currently does a regular copy of the `.gguf` file and asks for copy-on-write or reflink behavior instead.
  • For teams comparing Ollama with llama.cpp, that means duplicate model storage becomes part of the workflow even if runtime throughput is the only metric under test.
  • The issue points to APFS, Btrfs, and ZFS as the kinds of filesystems where reflinks could help, which shows the current copy behavior is an implementation choice, not a GGUF limitation.
// TAGS
ollamallminferencebenchmarkself-hostedcli

DISCOVERED

12d ago

2026-03-30

PUBLISHED

12d ago

2026-03-30

RELEVANCE

7/ 10

AUTHOR

Adorable_Weakness_39