Posts by Bill Ku

Building and Deploying Custom hipBLASLt Libraries on AMD Instinct GPUs

General Matrix Multiply (GEMM) operations are a core component of many generative AI workloads. Whether you are running attention mechanisms in the prefill phase of a Large Language Model (LLM) or generating tokens sequentially during the decode phase, matrix multiplication performance has a direct impact on end-to-end latency and throughput.

Read more ...