Software tools & optimizations - Page 4#
Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

Measuring Max-Achievable FLOPs – Part 2
AMD measures Max-Achievable FLOPS through controlled benchmarking: real-world data patterns, thermally stable devices, and cold cache testing—revealing how actual performance differs from theoretical peaks.

How to Build a vLLM Container for Inference and Benchmarking
This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2
This blog is part 2 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1
Understanding Peak, Max-Achievable & Delivered FLOPs

Deep dive into the MI300 compute and memory partition modes
This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications.

MI300A - Exploring the APU advantage
This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1
This blog is part 1 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

Announcing the AMD GPU Operator and Metrics Exporter
This post announces the AMD GPU Operator for Kubernetes and and the Device Metrics Exporter, including instructions for getting started with these new releases.

Getting started with AMD ROCm containers: from base images to custom solutions
This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs
Discover SGLang, a fast serving framework designed for large language and vision-language models on AMD GPUs, supporting efficient runtime and a flexible programming interface.

Getting to Know Your GPU: A Deep Dive into AMD SMI
This post introduces AMD System Management Interface (amd-smi), explaining how you can use it to access your GPU’s performance and status data

Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.
Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.