Software Tools and Optimizations - Page 2#
Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

How to Build a vLLM Container for Inference and Benchmarking
This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 2
This blog is part 2 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1
Understanding Peak, Max-Achievable & Delivered FLOPs

Deep dive into the MI300 compute and memory partition modes
This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications.

MI300A - Exploring the APU advantage
This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1
This blog is part 1 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

Announcing the AMD GPU Operator and Metrics Exporter
This post announces the AMD GPU Operator for Kubernetes and and the Device Metrics Exporter, including instructions for getting started with these new releases.

Getting started with AMD ROCm containers: from base images to custom solutions
This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs
Discover SGLang, a fast serving framework designed for large language and vision-language models on AMD GPUs, supporting efficient runtime and a flexible programming interface.

Getting to Know Your GPU: A Deep Dive into AMD SMI
This post introduces AMD System Management Interface (amd-smi), explaining how you can use it to access your GPU’s performance and status data

Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.
Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.

TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs
TensorFlow Profiler measures resource use and performance of models, helping identify bottlenecks for optimization. This blog demonstrates the use of the TensorFlow Profiler tool on AMD hardware.