Posts tagged Optimization

Accelerating models on ROCm using PyTorch TunableOp

In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.

Read more ...


Application portability with HIP

Many scientific applications run on AMD-equipped computing platforms and supercomputers, including Frontier, the first Exascale system in the world. These applications, coming from a myriad of science domains, were ported to run on AMD GPUs using the Heterogeneous-compute Interface for Portability (HIP) abstraction layer. HIP enables these High-Performance Computing (HPC) facilities to transition their CUDA codes to run and take advantage of the latest AMD GPUs. The effort involved in porting these scientific applications varies from a few hours to a few weeks and largely depends on the complexity of the original source code. Figure 1 shows several examples of applications that have been ported and the corresponding porting effort.

Read more ...


Jacobi Solver with HIP and OpenMP offloading

15 Sept, 2023 by Asitav Mishra, Rajat Arora, Justin Chang.

Read more ...


Finite difference method - Laplacian part 4

18 Jul, 2023 by Justin Chang, Thomas Gibson, Sean Miller.

Read more ...


Register pressure in AMD CDNA™2 GPUs

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


Finite difference method - Laplacian part 3

11 May, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.

Read more ...


Finite difference method - Laplacian part 2

4 Jan, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.

Read more ...


AMD matrix cores

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...