AMD unveils Instinct MI350P PCIe card

// 2h agoINFRASTRUCTURE

AMD unveils Instinct MI350P PCIe card

AMD is bringing its CDNA 4 data-center GPU to a PCIe add-in card with 144GB of HBM3E, 600W power draw, and support for air-cooled servers. It is aimed at on-prem inference and RAG workloads in existing infrastructure, but AMD has not shared pricing or availability.

// ANALYSIS

This is a practical move from AMD: instead of chasing only rack-scale deployments, it finally serves the large enterprise install base that needs serious GPU memory without redesigning the datacenter.

–The 144GB HBM3E footprint makes the MI350P interesting for inference-heavy jobs that outgrow consumer cards but do not need OAM/SXM-style clusters
–AMD’s choice to keep this as a PCIe deployment rather than a rack-scale OAM platform makes it more of a deployment-friendly inference card than giant model-sharding hardware
–The 600W default and 450W fallback are the real story here: AMD is trying to fit current-gen accelerator performance into air-cooled enterprise servers
–Open ROCm and the broader AMD software stack matter here because hardware alone will not move enterprise buyers
–Lack of pricing and availability keeps it from being immediately actionable, but the spec sheet is strong enough to pressure Nvidia in PCIe datacenter deployments

// TAGS

amd-instinct-mi350pgpuinferenceragself-hosted

DISCOVERED

2h ago

2026-05-07

PUBLISHED

4h ago

2026-05-07

RELEVANCE

8/ 10

AUTHOR

Noble00_

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE3m ago

OpenReel Video 0.2.0 upgrades browser editor

OpenReel Video is a browser-only, MIT-licensed video editor built with TypeScript, React, WebCodecs, and WebGPU. Its latest release, v0.2.0 on May 7, 2026, leans harder into local processing, no uploads, and 4K-capable editing.

MODEL4m ago

GPT-Realtime-Whisper brings streaming speech to text

OpenAI’s GPT-Realtime-Whisper is a low-latency transcription model that turns audio into text as people speak. It’s aimed at live captions, meeting notes, and other workflows where the transcript needs to keep pace with the speaker.

MODEL4m ago

GPT-Realtime-2 adds reasoning to voice agents

GPT-Realtime-2 is OpenAI’s new Realtime API voice model for production agents that need more than speech-to-speech playback. It adds GPT-5-class reasoning, better instruction following, stronger tool use, and more natural turn-taking so conversations can keep moving while the model thinks, calls tools, and recovers from interruptions or corrections.