HPC Blogs#
 
Performance Profiling on AMD GPUs - Part 3: Advanced Usage
Part 3 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools
 
ROCm 7.9 Technology Preview: ROCm Core SDK and TheRock Build System
Introduce ROCm Core SDK, and learn to install and build ROCm components easily using TheRock.
 
Medical Imaging on MI300X: Optimized SwinUNETR for Tumor Detection
Learn how to setup, run and optimize SwinUNETR on AMD MI300X GPUs for fast medical imaging 3D segmentation of tumors using fast, large ROIs.
 
GPU Partitioning Made Easy: Pack More AI Workloads Using AMD GPU Operator
What’s New in AMD GPU Operator: Learn About GPU Partitioning and New Kubernetes Features
 
Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture
This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4
 
Optimizing Drug Discovery Tools on AMD MI300X Part 1: Molecular Design with REINVENT
Learn how to set up, run, and optimize REINVENT4, a molecular design tool, on AMD MI300X GPUs for faster drug discovery workflows
 
ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity
Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools
 
Performance Profiling on AMD GPUs – Part 2: Basic Usage
Part 2 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools
 
Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.
 
Performance Profiling on AMD GPUs – Part 1: Foundations
Part 1 of our GPU profiling series introduces ROCm tools, setup steps, and key concepts to prepare you for deeper dives in the posts to follow.
 
AMD ROCm: Powering the World's Fastest Supercomputers
Discover how ROCm drives the world’s top supercomputers, from El Capitan to Frontier, and why its shaping the future of scalable, open and sustainable HPC
 
LLM Quantization with Quark on AMD GPUs: Accuracy and Performance Evaluation
Learn how to use Quark to apply FP8 quantization to LLMs on AMD GPUs, and evaluate accuracy and performance using vLLM and SGLang on AMD MI300X GPUs.