Posts tagged Optimization
Accelerating models on ROCm using PyTorch TunableOp
- 03 July 2024
In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.
Application portability with HIP
- 26 April 2024
Many scientific applications run on AMD-equipped computing platforms and supercomputers, including Frontier, the first Exascale system in the world. These applications, coming from a myriad of science domains, were ported to run on AMD GPUs using the Heterogeneous-compute Interface for Portability (HIP) abstraction layer. HIP enables these High-Performance Computing (HPC) facilities to transition their CUDA codes to run and take advantage of the latest AMD GPUs. The effort involved in porting these scientific applications varies from a few hours to a few weeks and largely depends on the complexity of the original source code. Figure 1 shows several examples of applications that have been ported and the corresponding porting effort.
Jacobi Solver with HIP and OpenMP offloading
- 15 September 2023
15 Sept, 2023 by Asitav Mishra, Rajat Arora, Justin Chang.
Finite difference method - Laplacian part 4
- 18 July 2023
18 Jul, 2023 by Justin Chang, Thomas Gibson, Sean Miller.
Register pressure in AMD CDNA™2 GPUs
- 17 May 2023
Note: This blog was previously part of the AMD lab notes blog series.
Finite difference method - Laplacian part 3
- 11 May 2023
11 May, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.
Finite difference method - Laplacian part 2
- 04 January 2023
4 Jan, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.
AMD matrix cores
- 14 November 2022
Note: This blog was previously part of the AMD lab notes blog series.