HPC - Software Tools & Optimizations - Page 2#
Performance Profiling on AMD GPUs – Part 1: Foundations
Part 1 of our GPU profiling series introduces ROCm tools, setup steps, and key concepts to prepare you for deeper dives in the posts to follow.
Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH
Installing ROCm from source with Spack
Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.
Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling
Discover ROCprofiler SDK – ROCm’s next-generation, unified, scalable, and high-performance profiling toolkit for AI and HPC workloads on AMD GPUs.
Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X
The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X
Measuring Max-Achievable FLOPs – Part 2
AMD measures Max-Achievable FLOPS through controlled benchmarking: real-world data patterns, thermally stable devices, and cold cache testing—revealing how actual performance differs from theoretical peaks.
Understanding Peak, Max-Achievable & Delivered FLOPs, Part 1
Understanding Peak, Max-Achievable & Delivered FLOPs
MI300A - Exploring the APU advantage
This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming
Deep dive into the MI300 compute and memory partition modes
This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications.
Getting to Know Your GPU: A Deep Dive into AMD SMI
This post introduces AMD System Management Interface (amd-smi), explaining how you can use it to access your GPU’s performance and status data
Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.
Presenting and demonstrating the use of the ROCm Offline Installer Creator, a tool enabling simple deployment of ROCm in disconnected environments in high-security environments and air-gapped networks.
TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs
TensorFlow Profiler measures resource use and performance of models, helping identify bottlenecks for optimization. This blog demonstrates the use of the TensorFlow Profiler tool on AMD hardware.