Posts tagged Linear Algebra
Accelerating models on ROCm using PyTorch TunableOp
- 03 July 2024
In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.
SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel
- 31 May 2024
The AMD ROCm™ Composable Kernel (CK) library provides a programming model for writing performance-critical kernels for machine learning workloads. It generates a general-purpose kernel during the compilation phase through a C++ template, enabling developers to achieve operation fusions on different data precisions.
Jacobi Solver with HIP and OpenMP offloading
- 15 September 2023
15 Sept, 2023 by Asitav Mishra, Rajat Arora, Justin Chang.
AMD matrix cores
- 14 November 2022
Matrix multiplication is a fundamental aspect of linear algebra and it is an ubiquitous computation within High Performance Computing (HPC) Applications. Since the introduction of AMD’s CDNA Architecture, Generalized Matrix Multiplication (GEMM) computations are now hardware-accelerated through Matrix Core Processing Units. Matrix Core accelerated GEMM kernels lie at the heart of BLAS libraries like rocBLAS but they can also be programmed directly by developers. Applications that are throughput bound by GEMM computation can achieve additional speedups by utilizing Matrix Cores.