Ollama copies GGUF files, lacks in-place mode

// 103d agoNEWS

Ollama copies GGUF files, lacks in-place mode

A Reddit user wants to benchmark Ollama's tok/s and TTFT against a llama.cpp server without paying for a second GGUF copy. Ollama's docs show GGUF import via Modelfile and ollama create, and an open GitHub issue confirms that path currently makes a regular copy into Ollama's storage.

// ANALYSIS

Ollama's simplicity is a feature until you care about storage semantics. For local benchmarking, the missing zero-copy path is a real papercut because it adds disk overhead without changing the model itself.

–The official import docs support `FROM /path/to/file.gguf` plus `ollama create`, but they do not describe a first-class in-place serving mode.
–An open GitHub issue says `ollama create` currently does a regular copy of the `.gguf` file and asks for copy-on-write or reflink behavior instead.
–For teams comparing Ollama with llama.cpp, that means duplicate model storage becomes part of the workflow even if runtime throughput is the only metric under test.
–The issue points to APFS, Btrfs, and ZFS as the kinds of filesystems where reflinks could help, which shows the current copy behavior is an implementation choice, not a GGUF limitation.

// TAGS

ollamallminferencebenchmarkself-hostedcli

DISCOVERED

103d ago

2026-03-30

PUBLISHED

103d ago

2026-03-30

RELEVANCE

7/ 10

AUTHOR

Adorable_Weakness_39

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS15m ago

GPT-5.6 Sol in Claude Code outperforms Codex

Running OpenAI's GPT-5.6 Sol within Anthropic's Claude Code terminal environment reportedly outperforms legacy tools like Codex. The setup highlights the growing shift toward terminal-centric agentic loops for complex software tasks.

MODEL44m ago

Modelers drops Ascend NPU-optimized models

Modelers, the open-source model hub for Huawei's Ascend NPU ecosystem, has released a batch of twelve new fine-tuned model entries focused on hardware-specific efficiency. The release aims to build developer momentum and optimize AI inference for Ascend NPUs, though the impact of individual updates is diluted by the sheer number of simultaneous entries and limited public differentiation.

OPEN SOURCE1h ago

C# PS5 emulator SharpEmu boots 2D games

SharpEmu is an experimental, open-source PlayStation 5 emulator written in C# that targets Windows, Linux, and macOS. In its early development stages, the project has successfully booted simple 2D games like Dreaming Sarah and shown initial progress loading complex titles such as Demon's Souls Remake.