YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Ollama copies GGUF files, lacks in-place mode

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Ollama copies GGUF files, lacks in-place mode
OPEN LINK ↗
// 58d agoNEWS

Ollama copies GGUF files, lacks in-place mode

A Reddit user wants to benchmark Ollama's tok/s and TTFT against a llama.cpp server without paying for a second GGUF copy. Ollama's docs show GGUF import via Modelfile and ollama create, and an open GitHub issue confirms that path currently makes a regular copy into Ollama's storage.

// ANALYSIS

Ollama's simplicity is a feature until you care about storage semantics. For local benchmarking, the missing zero-copy path is a real papercut because it adds disk overhead without changing the model itself.

  • The official import docs support `FROM /path/to/file.gguf` plus `ollama create`, but they do not describe a first-class in-place serving mode.
  • An open GitHub issue says `ollama create` currently does a regular copy of the `.gguf` file and asks for copy-on-write or reflink behavior instead.
  • For teams comparing Ollama with llama.cpp, that means duplicate model storage becomes part of the workflow even if runtime throughput is the only metric under test.
  • The issue points to APFS, Btrfs, and ZFS as the kinds of filesystems where reflinks could help, which shows the current copy behavior is an implementation choice, not a GGUF limitation.
// TAGS
ollamallminferencebenchmarkself-hostedcli

DISCOVERED

58d ago

2026-03-30

PUBLISHED

58d ago

2026-03-30

RELEVANCE

7/ 10

AUTHOR

Adorable_Weakness_39