Software tools & optimizations - Page 3

Software tools & optimizations - Page 3#

Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 3

This blog is part 3 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

March 13, 2025 by Victor Robles

Optimized ROCm Docker for Distributed AI Training

AMD updated Docker images incorporate torchtune finetuning, FP8 support, single node performance boost, bug fixes & updated benchmarking for stable, efficient distributed training

March 13, 2025 by Yao Fu, Anshul Gupta

Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X

March 02, 2025 by Jayacharan Kolla, Pedram Alizadeh, Gilbert Lee

Measuring Max-Achievable FLOPs – Part 2

AMD measures Max-Achievable FLOPS through controlled benchmarking: real-world data patterns, thermally stable devices, and cold cache testing—revealing how actual performance differs from theoretical peaks.

February 28, 2025 by Ben Sander, Evan Masters, Babak Poursartip, Henry Ho

How to Build a vLLM Container for Inference and Benchmarking

This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.

February 21, 2025 by Matt Elliott

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2

This blog is part 2 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

February 14, 2025 by Victor Robles

Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1

Understanding Peak, Max-Achievable & Delivered FLOPs

February 14, 2025 by Ben Sander

Deep dive into the MI300 compute and memory partition modes

This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications.

February 09, 2025 by Muhammad Osama, Ryan Swann, Karthik Sangaiah, Sonali Singh, Ganesh Dasika, Rajneesh Bhardwaj

MI300A - Exploring the APU advantage

This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming

February 09, 2025 by Suyash Tandon, Justin Chang

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1

This blog is part 1 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

February 07, 2025 by Victor Robles

Announcing the AMD GPU Operator and Metrics Exporter

This post announces the AMD GPU Operator for Kubernetes and and the Device Metrics Exporter, including instructions for getting started with these new releases.

January 29, 2025 by Farshad Ghodsian, Matt Elliott

Getting started with AMD ROCm containers: from base images to custom solutions

This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.

January 16, 2025 by Matt Elliott

Prev Page 3 of 5 Next