BACK_TO_FEEDAICRIER_2
LocalLLaMA thread sketches dual-PC assistant stack
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoINFRASTRUCTURE

LocalLLaMA thread sketches dual-PC assistant stack

A Reddit user asks how to split an Obsidian-backed life assistant across a Zephyrus G14 and an older GTX 1080 desktop. The thread’s first answer points to a pragmatic local setup: serve the main model on the laptop, push embeddings and memory work onto the desktop, and keep markdown files as the durable memory store.

// ANALYSIS

The interesting part here is that the “memory model” idea is probably less important than the system split. For this kind of setup, the laptop should be the brain and the desktop should be the plumbing.

  • The 5070 Ti laptop is the only machine here that can comfortably host the main chat model
  • The GTX 1080 box is better suited to embeddings, retrieval, file ingestion, and always-on services than to primary generation
  • Obsidian is a sensible memory backbone because markdown is portable, inspectable, and easy to automate
  • Mem0 and Letta/MemGPT-style systems are closer to memory layers than standalone magic models, so they still need an actual assistant model underneath
  • A local server setup like `llama.cpp --server` is a practical way to expose the laptop model over the network to the desktop and other clients
// TAGS
llmagentembeddingragself-hostedlocal-llama

DISCOVERED

4h ago

2026-04-25

PUBLISHED

7h ago

2026-04-25

RELEVANCE

7/ 10

AUTHOR

Jordan-Vegas