Posts tagged Optimization

Seismic stencil codes - part 3

12 Aug, 2024 by Justin Chang and Ossian O’Reilly.

Read more ...


Seismic stencil codes - part 2

12 Aug, 2024 by Justin Chang and Ossian O’Reilly.

Read more ...


Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

In this blog we explore how to fine-tune the Robustly Optimized BERT Pretraining Approach (RoBERTa) large language model, with emphasis on PyTorch’s mixed precision capabilities. Specifically, we explore using AMD GPUs for mixed precision fine-tuning to achieve faster model training without any major impacts on accuracy.

Read more ...


Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

This blog provides a comprehensive guide on measuring and comparing the performance of various algorithms in a JAX-implemented generative AI model. Leveraging the JAX Profiler and statistical analysis, this blog demonstrates how to reliably evaluate key steps and compare algorithm performance on AMD GPUs.

Read more ...


Accelerating models on ROCm using PyTorch TunableOp

In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.

Read more ...


Application portability with HIP

Many scientific applications run on AMD-equipped computing platforms and supercomputers, including Frontier, the first Exascale system in the world. These applications, coming from a myriad of science domains, were ported to run on AMD GPUs using the Heterogeneous-compute Interface for Portability (HIP) abstraction layer. HIP enables these High-Performance Computing (HPC) facilities to transition their CUDA codes to run and take advantage of the latest AMD GPUs. The effort involved in porting these scientific applications varies from a few hours to a few weeks and largely depends on the complexity of the original source code. Figure 1 shows several examples of applications that have been ported and the corresponding porting effort.

Read more ...


Jacobi Solver with HIP and OpenMP offloading

15 Sept, 2023 by Asitav Mishra, Rajat Arora, Justin Chang.

Read more ...


Finite difference method - Laplacian part 4

18 Jul, 2023 by Justin Chang, Thomas Gibson, Sean Miller.

Read more ...


Register pressure in AMD CDNA™2 GPUs

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


Finite difference method - Laplacian part 3

11 May, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.

Read more ...


Finite difference method - Laplacian part 2

4 Jan, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.

Read more ...


AMD matrix cores

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...