OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoINFRASTRUCTURE
Hermes Agent users eye multimodal desktop
A Reddit user running Hermes Agent with Qwen 3.5-27B dense on a GTX 3090 and 64 GB RAM wants a Claude Desktop-style multimodal client for everyday admin work. The ask is for one local assistant that can talk to Anytype over MCP, handle screenshots, answer questions, and generate images.
// ANALYSIS
This is the right direction for local AI, but the stack is still split across too many pieces: model, client, vision, and tool integrations all need to line up before it feels like a true daily driver.
- –Hermes Agent already supports multimodal vision, but it still depends on pairing it with a client that handles screenshots and MCP cleanly.
- –Anytype’s MCP server makes the knowledge-base side viable; the gap is a polished desktop UX for non-coding workflows.
- –On this hardware, the practical path is usually a strong text model plus a lighter vision model, not one giant local model doing everything.
- –The real product opportunity is a “life admin” assistant with the friction hidden, not another coding-centric agent UI.
// TAGS
hermes-agentllmagentmultimodalmcpself-hostedautomation
DISCOVERED
3d ago
2026-04-08
PUBLISHED
4d ago
2026-04-08
RELEVANCE
6/ 10
AUTHOR
CaptainD5