Posts by Bill Ku
Building and Deploying Custom hipBLASLt Libraries on AMD Instinct GPUs
- 18 June 2026
General Matrix Multiply (GEMM) operations are a core component of many generative AI workloads. Whether you are running attention mechanisms in the prefill phase of a Large Language Model (LLM) or generating tokens sequentially during the decode phase, matrix multiplication performance has a direct impact on end-to-end latency and throughput.