HPC Blogs - Page 3#

Seismic stencil codes - part 2
Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

Seismic stencil codes - part 3
Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

Graph analytics on AMD GPUs using Gunrock
Graph analytics on AMD GPUs using Gunrock

Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs
Using Statistical Methods to Reliably Compare Algorithm Performance in Large Generative AI Models with JAX Profiler on AMD GPUs

Mamba on AMD GPUs with ROCm
Best practices of using Mamba on AMD GPUs with ROCm

TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs
TensorFlow Profiler measures resource use and performance of models, helping identify bottlenecks for optimization. This blog demonstrates the use of the TensorFlow Profiler tool on AMD hardware.

AMD in Action: Unveiling the Power of Application Tracing and Profiling
AMD in Action: Unveiling the Power of Application Tracing and Profiling

C++17 parallel algorithms and HIPSTDPAR #
C++17 parallel algorithms and HIPSTDPAR