Systems Blogs

Systems Blogs#

Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Learn how a small-radius expert parallel design with prefill–decode disaggregation enables scalable, fault-isolated LLM inference on AMD Instinct™ MI300X clusters.

November 12, 2025 by Peng Sun, Andy Luo, Gilbert Lei, Lingpeng Jin, Carlus Huang, Duyi Wang, Mingzhi Liu, Di Tian, Bill He, Jun Chen, Yutong Wu, Jiahao Zhou, Niko Ma

GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator

What’s New in AMD GPU Operator: Learn About GPU Partitioning and New Kubernetes Features

October 01, 2025 by Alireza Sariaslani

Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4

September 30, 2025 by Amanzhol Salykov, Andy Luo, Carlus Huang, Peng Sun

ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity

Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools

September 16, 2025 by Liam Berry, Mohammed Faraaz Mustafa, Danny Guan, Saad Rahim, Aditya Bhattacharji, Marilyn Basanta

Ecosystems & Partners

Unlocking GPU-Accelerated Containers with the AMD Container Toolkit

Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.

July 03, 2025 by Abhishek Patil

ROCm Revisited: Getting Started with HIP

New to HIP? This blog will introduce you to the HIP runtime API, its key concepts and installation and practical code examples to showcase its functionality.

June 06, 2025 by Liam Berry, Mohammed Faraaz Mustafa, Saad Rahim

ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver

We introduce the new Instinct driver-a modular GPU driver with independent releases simplifying workflows, system setup, and enhancing compatibility across toolkit versions.

April 11, 2025 by Saad Rahim, Danny Guan

Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators

Stone Ridge Technology (SRT) pioneered the use of GPUs for high performance reservoir simulation (HPC) nearly a decade ago with ECHELON, its flagship software product. ECHELON, the first of its kind, engineered from the outset to harness the full potential of massively parallel GPUs, stands apart in the industry for its power, efficiency, and accuracy. Now, ECHELON has added support for AMDInstinct accelerators into its simulation engine, offering new flexibility and optionality to its clients.

June 10, 2024

Applications & Models

Deploying Serverless AI Inference on AMD GPU Clusters

This blog helps targeted audience in setting up AI inference serverless deployment in a kubernetes cluster with AMD accelerators. Blog aims to provide a comprehensive guide for deploying and scaling AI inference workloads on serverless infrastructre.

February 25, 2025 by Rathnakara Malatesha

Software Tools & Optimizations

ROCm Runfile Installer Is Here!

Overview of ROCm Runfile Installer introduced in ROCm 6.4, allowing a complete single package for driver and ROCm installation without internet connectivity

May 22, 2025 by Douglas Hamilton, Saad Rahim, Liam Berry

Installing ROCm from source with Spack

Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.

April 14, 2025 by Garrett Byrd, Joseph Schoonover

What's New in the AMD GPU Operator v1.2.0 Release

This blog highlights the new feature enhancements that were released as part of the AMD GPU Operator v1.2.0 release. New features that enhance the use of AMD Instinct GPUs on Kubernetes including Automated Upgrades, Health Checks and Open-sourcing the codebase.

March 28, 2025 by Farshad Ghodsian

Announcing the AMD GPU Operator and Metrics Exporter

This post announces the AMD GPU Operator for Kubernetes and and the Device Metrics Exporter, including instructions for getting started with these new releases.

January 29, 2025 by Farshad Ghodsian, Matt Elliott

Stay informed

Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
Signup for the ROCm newsletter
View our blog statistics