Software tools & optimizations - Page 3#
Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1
This blog is part 1 of a series aimed at providing a comprehensive, step-by-step guide for deploying and scaling AI inference workloads with Kubernetes and the AMD GPU Operator on the AMD Instinct platform

Announcing the AMD GPU Operator and Metrics Exporter
This post announces the AMD GPU Operator for Kubernetes and and the Device Metrics Exporter, including instructions for getting started with these new releases.

Getting started with AMD ROCm containers: from base images to custom solutions
This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs
Discover SGLang, a fast serving framework designed for large language and vision-language models on AMD GPUs, supporting efficient runtime and a flexible programming interface.

Getting to Know Your GPU: A Deep Dive into AMD SMI
This post introduces AMD System Management Interface (amd-smi), explaining how you can use it to access your GPU’s performance and status data

Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.
Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.

TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs
TensorFlow Profiler measures resource use and performance of models, helping identify bottlenecks for optimization. This blog demonstrates the use of the TensorFlow Profiler tool on AMD hardware.

SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel
SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

AMD in Action: Unveiling the Power of Application Tracing and Profiling
AMD in Action: Unveiling the Power of Application Tracing and Profiling

C++17 parallel algorithms and HIPSTDPAR #
C++17 parallel algorithms and HIPSTDPAR