Systems Blogs

Systems Blogs#

March 09, 2026

Agentic Diagnosis for LLM Training at Scale

Explore how AI agents diagnose LLM training incidents — from RCCL hangs to throughput regressions — in one prompt with MaxText-Slurm.

./software-tools-optimization/maxtext-slurm-agentic-diagnosis/README.html

March 02, 2026

MaxText-Slurm: Production-Grade LLM Training with Built-In Observability

MaxText-Slurm: A unified launch system for production-grade LLM training with observability on AMD GPU clusters.

./software-tools-optimization/maxtext-slurm/README.html

February 23, 2026

Primus-Pipeline: A More Flexible and Scalable Pipeline Parallelism Implementation

Learn how to use our flexible and scalable pipeline parallelism framework with Primus backend and AMD hardware.

./software-tools-optimization/primus-pipeline/README.html

February 09, 2026

Building Robotics Applications with Ryzen AI and ROS 2

This blog post gives a walkthrough of how to deploy a robotics application on the AI PC integrated with ROS - the robot operating system. We showcase Ryzen AI CVML Library to do perception tasks like depth estimation and develop a custom ROS 2 node which allows easy integration with the ROS ecosystem and standard components.

./ecosystems-and-partners/ryzenai-cvml-ros/README.html

January 30, 2026

Debugging NaN Results in CK Tile GEMM: A rocgdb Detective Story

Learn GPU kernel debugging with rocgdb through a real case: tracing NaN outputs to a one-character typo in CK Tile GEMM

./software-tools-optimization/rocgdb-ck-tile/README.html

January 22, 2026

ROCm 7.2: Smarter, Faster, and More Scalable for Modern AI Workloads

we highlight the latest ROCm 7.2 enhancements for AMD Instinct GPUs, designed to boost AI and HPC performance

./software-tools-optimization/rocm7.2/README.html

January 21, 2026

ROCm Becomes a First-Class Platform in the vLLM Ecosystem

ROCm is now a first-class vLLM platform: official wheels + Docker, stronger CI, and faster LLM & multimodal inference on AMD Instinct GPUs.

./software-tools-optimization/vllm-omni/README.html

January 13, 2026

Reimagining GPU Allocation in Kubernetes: Introducing the AMD GPU DRA Driver

Explore how the AMD GPU DRA Driver brings declarative, attribute-aware GPU scheduling to Kubernetes — learn how to request and manage GPUs natively

./software-tools-optimization/dra-gpu/README.html

January 08, 2026

Introducing the AMD Network Operator v1.0.0: Simplifying High-Performance Networking for AMD Platforms

Introducing the AMD Network Operator for automating high-performance AI NIC networking in Kubernetes for AI and HPC workloads

./software-tools-optimization/amd-network-operator/README.html

November 12, 2025

Practical, Fault‑Robust Distributed Inference for DeepSeek on AMD MI300X

Learn how a small-radius expert parallel design with prefill–decode disaggregation enables scalable, fault-isolated LLM inference on AMD Instinct™ MI300X clusters.

./software-tools-optimization/wide-ep-deepseek/README.html

October 01, 2025

GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator

What’s New in AMD GPU Operator: Learn About GPU Partitioning and New Kubernetes Features

./software-tools-optimization/gpu-operator-partitioning/README.html

September 30, 2025

Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture

This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4

./software-tools-optimization/matrix-cores-cdna/README.html

Prev Page 1 of 3 Next