HPC Blogs#

Installing ROCm from source with Spack
Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.

ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software
Explore ROCm 6.4's key advancements: AI/HPC performance boosts, enhanced profiling tools, better Kubernetes support and modular drivers, accelerating AI and HPC workloads on AMD GPUs.

Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling
Discover ROCprofiler SDK – ROCm’s next-generation, unified, scalable, and high-performance profiling toolkit for AI and HPC workloads on AMD GPUs.

Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X
The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X

Boosting Computational Fluid Dynamics Performance with AMD Instinct™ MI300X
The blog introduces CFD Ansys Fluent benchmarks and provides hands-on guide on installing and running four different Fluent models on AMD GPUs using ROCm.

Introducing AMD's Next-Gen Fortran Compiler
In this post we present a brief preview of AMD's Next-Gen Fortran Compiler, our new open source Fortran complier optimized for AMD GPUs using OpenMP offloading, offering direct interface to ROCm and HIP.

Seismic stencil codes - part 1
Seismic Stencil Codes - Part 1: Seismic workloads in the HPC space have a long history of being powered by high-order finite difference methods on structured grids. This trend continues to this day.

Seismic stencil codes - part 2
Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

Seismic stencil codes - part 3
Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

Graph analytics on AMD GPUs using Gunrock
Graph analytics on AMD GPUs using Gunrock

Measuring Max-Achievable FLOPs – Part 2
AMD measures Max-Achievable FLOPS through controlled benchmarking: real-world data patterns, thermally stable devices, and cold cache testing—revealing how actual performance differs from theoretical peaks.

Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1
Understanding Peak, Max-Achievable & Delivered FLOPs

Deep dive into the MI300 compute and memory partition modes
This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications.

MI300A - Exploring the APU advantage
This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics