Posts by YangWen Huang

Customizing Kernels with hipBLASLt TensileLite GEMM Tuning - Advanced User Guide

06 April 2026

Optimizing General Matrix Multiply (GEMM) operations is critical for maximizing the efficiency of AI models on AMD hardware. In our previous blog posts, we explored Offline Tuning, a method for selecting the best-performing kernel from an existing solution pool. For detailed instructions on using hipBLASLt-bench, please refer to hipBLASLt offline tuning part 1 and part 2. Additionally, for a streamlined experience, check out the Day 0 Developer Guide: hipBLASLt Offline GEMM Tuning Script which covers one-click offline tuning. Furthermore, for scenarios requiring dynamic runtime adaptation, developers can explore our recently published blog on hipBLASLt Online GEMM Tuning.

Read more ...

GEMM Tuning within hipBLASLt– Part 2

09 October 2025

This post continues from Part 1 where we introduced GEMM tuning concepts in hipBLASLt and explored the basics of solution search. In Part 2, we focus on offline tuning with the hipblaslt-bench tool. This workflow allows developers to benchmark candidate GEMM kernels for specific problem shapes, capture the best-performing solutions, and reuse them at runtime without rebuilding or modifying the hipBLASLt library.

Read more ...

GEMM Tuning within hipBLASLt - Part 1

05 September 2025

When optimizing matrix operations on AMD GPUs using the ROCm platform, tuning specific problem sizes is essential for achieving maximum performance. The hipBLASLt library supports two official tuning mechanisms:

Read more ...