Software tools & optimizations - Page 2

Software tools & optimizations - Page 2#

Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility

May 06, 2025 by Lei Zhang, George Wang, Fan Wu, Peng Sun, Kyle Wang, Anshul Gupta

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

Dive into kernel-level profiling of DeepseekV3 on SGLang—identify GPU bottlenecks and boost large language model performance using ROCm

May 01, 2025 by Liz Li, Shekhar Pandey, Seungrok Jung, Andy Luo

Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs

Learn how to boost your Llama 4 inference performance on AMD MI300X GPUs using AITER-optimized kernels and advanced vLLM techniques

April 28, 2025 by Liz Li, Seungrok Jung, Andy Luo, Shekhar Pandey

Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs

This blog shows you how to speedup your multimodal models with AMD’s open-source PyTorch tools for speculative decoding on MI300X GPUs

April 28, 2025 by Mohammad Mahdi Kamani, Parsa Fashi, Vikram Appia, Emad Barsoum

Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

Build high-performance GEMM kernels using CK-Tile on AMD Instinct GPUs with vendor-optimized pipelines and policies for AI and HPC workloads

April 15, 2025 by David Li, George Wang

Installing ROCm from source with Spack

Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.

April 14, 2025 by Garrett Byrd, Joseph Schoonover

Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

Learn how Triton compiles and optimizes AI kernels on AMD GPUs, with deep dives into IR flows, hardware-specific passes, and performance tuning tips

April 10, 2025 by Ning Zhang, George Wang

What's New in the AMD GPU Operator v1.2.0 Release

This blog highlights the new feature enhancements that were released as part of the AMD GPU Operator v1.2.0 release. New features that enhance the use of AMD Instinct GPUs on Kubernetes including Automated Upgrades, Health Checks and Open-sourcing the codebase.

March 28, 2025 by Farshad Ghodsian