Software tools & optimizations

Software tools & optimizations#

Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

AMD introduces GEAK, an AI agent for generating optimized Triton GPU kernels, achieving up to 63% accuracy and up to 2.59× speedups on MI300X GPUs.

August 01, 2025 by Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Chao Xu, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad Barsoum

Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

This blog shows how CK-Tile’s XOR-based swizzle optimizes shared memory access in GEMM kernels on AMD GPUs by eliminating LDS bank conflicts

July 25, 2025 by Haocong Wang, Clement Lin, Meng-Hsuan Yang, Yu-Chen Lin, Bobo Fang, Chun-Hung Wang, David Li, George Wang, Anshul Gupta

Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU

Fine-tune Llama 3.2 Vision models on AMD MI300X GPU using Torchtune, achieving 2.3× better accuracy with 11B vs 90B model on chart-based tasks.

July 21, 2025 by Matthias Reso

Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

Accelerate life science and medical workloads with ROCm-LS, AMDs GPU-optimized toolkit for faster multidimensional image processing and vision.

July 18, 2025 by Soumitra Chatterjee, Karthik Kashyap Thatipamula, Deeksha Goplani, Ish Kool, Anik Chaudhuri, Vikas C Sajjan, Marco Grond

Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Fully utilize the power of AMDs Instinct GPUs to process and interpret detailed multidimensional images with lightning speed.

July 18, 2025 by Soumitra Chatterjee, Karthik Kashyap Thatipamula, Deeksha Goplani, Ish Kool, Anik Chaudhuri, Vikas C Sajjan, Marco Grond

vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance

vLLM v1 on AMD ROCm boosts LLM serving with faster TTFT, higher throughput, and optimized multimodal support—ready out of the box.

July 07, 2025 by Seungrok Jung, Hyukjoon Lee, Andy Luo

Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm

vLLM v0.9.x is here with major ROCm™ optimizations—boosting LLM performance, reducing latency, and expanding model support on AMD Instinct™ GPUs.

June 28, 2025 by Hongxia Yang, Peng Sun, Tun Jian Tan, Pin Siang Tan, Anshul Gupta

Performance Profiling on AMD GPUs – Part 1: Foundations

Part 1 of our GPU profiling series introduces ROCm tools, setup steps, and key concepts to prepare you for deeper dives in the posts to follow.

June 26, 2025 by Gina Sitaraman, Thomas Gibson, Luka Stanisic, Giacomo Capodaglio, Alessandro Fanfarillo, Asitav Mishra

Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm

Fine-tune LLMs with GRPO on AMD MI300X—leverage ROCm, Hugging Face TRL, and vLLM for efficient reasoning and scalable RLHF

June 18, 2025 by Zhu Shan, George Wang

ROCm Runfile Installer Is Here!

Overview of ROCm Runfile Installer introduced in ROCm 6.4, allowing a complete single package for driver and ROCm installation without internet connectivity

May 22, 2025 by Douglas Hamilton, Saad Rahim, Liam Berry

From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs

May 21, 2025 by Haocong Wang, Kevin Chang, David Li, George Wang

Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH

May 20, 2025 by Marco Grond, Saad Rahim

Prev Page 1 of 5 Next