Data Science Blogs#

Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware
Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors

Benchmarking Reasoning Models: From Tokens to Answers
Learn how to benchmark reasoning tasks. Use Qwen3 and vLLM to test true reasoning performance, not just how fast words are generated.

Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs
Accelerate life science and medical workloads with ROCm-LS, AMDs GPU-optimized toolkit for faster multidimensional image processing and vision.

Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing
Fully utilize the power of AMDs Instinct GPUs to process and interpret detailed multidimensional images with lightning speed.

Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.

Siemens taps AMD Instinct™ GPUs to expand high-performance hardware options for Simcenter STAR-CCM+
Siemens recently announced that its Simcenter STAR-CCM+ multi-physics computational fluid dynamics (CFD) software now supports AMD Instinct™ GPUs for GPU-native computation. This move addresses its users' needs for computational efficiency, reduced simulation costs and energy usage, and greater hardware choice.

Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang
Boost DeepSeek-R1 with AITER: Step-by-step SGLang integration for high-performance MoE, GEMM, and attention ops on AMD GPUs

Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG
Learn how to decompress JPEG files at breakneck speeds for your AI, vision, and content delivery workloads using rocJPEG and AMD Instinct GPUs.

Seismic stencil codes - part 2
Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

Seismic stencil codes - part 3
Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools
Dive into kernel-level profiling of DeepseekV3 on SGLang—identify GPU bottlenecks and boost large language model performance using ROCm

Installing ROCm from source with Spack
Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.

Understanding RCCL Bandwidth and xGMI Performance on AMD Instinct™ MI300X
The blog explains the reasons behind RCCL bandwidth limitations and xGMI performance constraints, and provides actionable steps to maximize link efficiency on AMD MI300X
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics