HPC Blogs - Page 2

HPC Blogs - Page 2#

Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X

The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X

March 02, 2025 by Jayacharan Kolla, Pedram Alizadeh, Gilbert Lee

Measuring Max-Achievable FLOPs – Part 2

AMD measures Max-Achievable FLOPS through controlled benchmarking: real-world data patterns, thermally stable devices, and cold cache testing—revealing how actual performance differs from theoretical peaks.

February 28, 2025 by Ben Sander, Evan Masters, Babak Poursartip, Henry Ho

Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1

Understanding Peak, Max-Achievable & Delivered FLOPs

February 14, 2025 by Ben Sander

Deep dive into the MI300 compute and memory partition modes

This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications.

February 09, 2025 by Muhammad Osama, Ryan Swann, Karthik Sangaiah, Sonali Singh, Ganesh Dasika, Rajneesh Bhardwaj

MI300A - Exploring the APU advantage

This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming

February 09, 2025 by Suyash Tandon, Justin Chang

Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X

The blog introduces CFD Ansys Fluent benchmarks and provides hands-on guide on installing and running four different Fluent models on AMD GPUs using ROCm.

January 14, 2025 by Martin Huarte

Introducing AMD's Next-Gen Fortran Compiler

In this post we present a brief preview of AMD's Next-Gen Fortran Compiler, our new open source Fortran complier optimized for AMD GPUs using OpenMP offloading, offering direct interface to ROCm and HIP.

November 13, 2024 by Justin Chang, Brian Cornille, Michael Klemm, Johanna Potyka

Getting to Know Your GPU: A Deep Dive into AMD SMI

This post introduces AMD System Management Interface (amd-smi), explaining how you can use it to access your GPU’s performance and status data

September 17, 2024 by Matt Elliott

Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.

Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.

September 10, 2024 by Matt Elliott

Seismic stencil codes - part 1

Seismic Stencil Codes - Part 1: Seismic workloads in the HPC space have a long history of being powered by high-order finite difference methods on structured grids. This trend continues to this day.

August 29, 2024 by Justin Chang, Ossian O'Reilly

Seismic stencil codes - part 2

Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

August 29, 2024 by Justin Chang, Ossian O'Reilly

Seismic stencil codes - part 3

Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

August 29, 2024 by Justin Chang, Ossian O'Reilly

Prev Page 2 of 4 Next