Posts tagged Memory

Reading AMDGCN ISA

For an application developer it is often helpful to read the Instruction Set Architecture (ISA) for the GPU architecture that is used to perform its computations. Understanding the instructions of the pertinent code regions of interest can help in debugging and achieving performance optimization of the application.

Read more ...


C++17 parallel algorithms and HIPSTDPAR

The C++17 standard added the concept of parallel algorithms to the pre-existing C++ Standard Library. The parallel version of algorithms like std::transform maintain the same signature as the regular serial version, except for the addition of an extra parameter specifying the execution policy to use. This flexibility allows users that are already using the C++ Standard Library algorithms to take advantage of multi-core architectures by just introducing minimal changes to their code.

Read more ...


Affinity part 2 - System topology and controlling affinity

In Part 1 of the Affinity blog series, we looked at the importance of setting affinity for High Performance Computing (HPC) workloads. In this blog post, our goals are the following:

Read more ...


Affinity part 1 - Affinity, placement, and order

Modern hardware architectures are increasingly complex with multiple sockets, many cores in each Central Processing Unit (CPU), Graphical Processing Units (GPUs), memory controllers, Network Interface Cards (NICs), etc. Peripherals such as GPUs or memory controllers will often be local to a CPU socket. Such designs present interesting challenges in optimizing memory access times, data transfer times, etc. Depending on how the system is built, hardware components are connected, and the workload being run, it may be advantageous to use the resources of the system in a specific way. In this article, we will discuss the role of affinity, placement, and order in improving performance for High Performance Computing (HPC) workloads. A short case study is also presented to familiarize you with performance considerations on a node in the Frontier supercomputer. In a follow-up article, we also aim to equip you with the tools you need to understand your system’s hardware topology and set up affinity for your application accordingly.

Read more ...


Sparse matrix vector multiplication - part 1

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


Finite difference method - Laplacian part 4

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


Register pressure in AMD CDNA™2 GPUs

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


Finite difference method - Laplacian part 3

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


Introduction to profiling tools for AMD hardware

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


AMD Instinct™ MI200 GPU memory space overview

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


Finite difference method - Laplacian part 2

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...


AMD matrix cores

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...