Data Science Blogs - Page 2#

Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X
The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X

Seismic stencil codes - part 2
Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

Seismic stencil codes - part 3
Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs
Using Statistical Methods to Reliably Compare Algorithm Performance in Large Generative AI Models with JAX Profiler on AMD GPUs

Accelerating models on ROCm using PyTorch TunableOp
Accelerating models on ROCm using PyTorch TunableOp

SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel
SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+
Siemens recently announced that its Simcenter STAR-CCM+ multi-physics computational fluid dynamics (CFD) software now supports AMD Instinct™ GPUs for GPU-native computation. This move addresses its users' needs for computational efficiency, reduced simulation costs and energy usage, and greater hardware choice.

Speech-to-Text on an AMD GPU with Whisper
Speech to Text on AMD with Whisper

Sparse matrix vector multiplication - part 1
Sparse matrix vector multiplication - Part 1