Posted in 2023

Sparse matrix vector multiplication - part 1

3 Nov, 2023 by Paul Mullowney.

Read more ...


Jacobi Solver with HIP and OpenMP offloading

15 Sept, 2023 by Asitav Mishra, Rajat Arora, Justin Chang.

Read more ...


Creating a PyTorch/TensorFlow code environment on AMD GPUs

Goal: The machine learning ecosystem is quickly exploding and we aim to make porting to AMD GPUs simple with this series of machine learning blogposts.

Read more ...


Finite difference method - Laplacian part 4

18 Jul, 2023 by Justin Chang, Thomas Gibson, Sean Miller.

Read more ...


GPU-aware MPI with ROCm

MPI is the de facto standard for inter-process communication in High-Performance Computing. MPI processes compute on their local data while extensively communicating with each other. This enables MPI programs to be executed on systems with a distributed memory space e.g. clusters. There are different types of communications supported in MPI including point-to-point and collective communications. Point-to-point communication is the basic communication mechanism in which both the sending process and the receiving process take part in the communication. The sender has a buffer that holds the message and an envelope containing information that will be used by the receiver side (e.g., message tag, the sender rank number, etc.). The receiver uses the information in the envelope to select the specified message and stores it in its receiver buffer. In collective communication, messages can be exchanged among a group of processes rather than just two of them. Collective communication provides opportunities for processes to perform one-to-many and many-to-many communications in a convenient, portable and optimized way. Some examples of collective communications include broadcast, allgather, alltoall, and allreduce.

Read more ...


Register pressure in AMD CDNA™2 GPUs

Register pressure in GPU kernels has a tremendous impact on the overall performance of your HPC application. Understanding and controlling register usage allows developers to carefully design codes capable of maximizing hardware resources. The following blog post is focused on a practical demo showing how to apply the recommendations explained in this OLCF training talk presented on August 23rd 2022. Here is the training archive where you can also find the slides. We focus solely on the AMD CDNA™2 architecture (MI200 series GPUs) using ROCm 5.4.

Read more ...


Finite difference method - Laplacian part 3

11 May, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.

Read more ...


Introduction to profiling tools for AMD hardware

Getting a code to be functionally correct is not always enough. In many industries, it is also required that applications and their complex software stack run as efficiently as possible to meet operational demands. This is particularly challenging as hardware continues to evolve over time, and as a result codes may require further tuning. In practice, many application developers construct benchmarks, which are carefully designed to measure the performance, such as execution time, of a particular code within an operational-like setting. In other words: a good benchmark should be representative of the real work that needs to be done. These benchmarks are useful in that they provide insight into the characteristics of the application, and enables one to discover potential bottlenecks that could result in performance degradation during operational settings.

Read more ...


AMD Instinct™ MI200 GPU memory space overview

The HIP API supports a wide variety of allocation methods for host and device memory on accelerated systems. In this post, we will:

Read more ...


AMD ROCm™ installation

AMD ROCm™ is the first open-source software development platform for HPC/Hyperscale-class GPU computing. AMD ROCm™ brings the UNIX philosophy of choice, minimalism and modular software development to GPU computing. Please see the AMD Open Software Platform for GPU Compute and ROCm Informational Portal pages for more information.

Read more ...


Finite difference method - Laplacian part 2

4 Jan, 2023 by Justin Chang, Rajat Arora, Thomas Gibson, Sean Miller, Ossian O’Reilly.

Read more ...