ROCm Blogs#
Featured Posts

Unlock DeepSeek-R1 Inference Performance on AMD Instinct™ MI300X GPU
This blog introduces the key performance optimizations made to enable DeepSeek-R1 Inference

SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD Instinct GPUs
Discover SGLang, a fast serving framework designed for large language and vision-language models on AMD GPUs, supporting efficient runtime and a flexible programming interface.

Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs
Build high-performance GEMM kernels using CK-Tile on AMD Instinct GPUs with vendor-optimized pipelines and policies for AI and HPC workloads

Deep dive into the MI300 compute and memory partition modes
This blog explains how to use the MI300 compute and memory partitioning modes to optimize your performance-critical applications.

Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG
Learn how to decompress JPEG files at breakneck speeds for your AI, vision, and content delivery workloads using rocJPEG and AMD Instinct GPUs.

DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs
This blog post demonstrates how hipDF significantly enhances and accelerates data manipulation, aggregation, and transformation tasks on AMD hardware using ROCm.

CuPy and hipDF on AMD: The Basics and Beyond
Learn how to deploy CuPy and hipDF on AMD GPUs. See their high-performance computing advantages, and use CuPy and hipDF in a detailed example of an investment portfolio allocation optimization using the Markowitz model.

Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed
Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility

A Step-by-Step Guide On How To Deploy Llama Stack on AMD Instinct™ GPU
Learn how to use Meta’s Llama Stack with AMD ROCm and vLLM to scale inference, integrate APIs, and streamline production-ready AI workflows on AMD Instinct™ GPU

ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software
Explore ROCm 6.4's key advancements: AI/HPC performance boosts, enhanced profiling tools, better Kubernetes support and modular drivers, accelerating AI and HPC workloads on AMD GPUs.

ROCm Gets Modular: Meet the Instinct Datacenter GPU Driver
We introduce the new Instinct driver-a modular GPU driver with independent releases simplifying workflows, system setup, and enhancing compatibility across toolkit versions.

AMD Advances Enterprise AI Through OPEA Integration
We announce AMD’s support of Open Platform for Enterprise AI (OPEA), integrating OPEA’s enterprise GenAI framework with AMD’s computing hardware and ROCm software

Power Up Qwen 3 with AMD Instinct: A Developer’s Day 0 Quickstart
Explore the power of Alibaba's QWEN3 models on AMD Instinct™ MI300X and MI325X GPUs - available from Day 0 with seamless SGLang and vLLM integration

Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration
Deploy verl on AMD GPUs for fast, scalable RLHF training with ROCm optimization, Docker scripts, and impressive throughput-convergence results

Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel
Learn how to compress LLMs with GPTQModel and run them efficiently on AMD GPUs using INT4 quantization, reducing memory use, shrinking model size, and enabling fast inference

Power Up Llama 4 with AMD Instinct: A Developer’s Day 0 Quickstart
Explore the power of Meta’s Llama 4 multimodal models on AMD Instinct™ MI300X and MI325X GPUs - available from Day 0 with seamless vLLM integration

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools
Dive into kernel-level profiling of DeepseekV3 on SGLang—identify GPU bottlenecks and boost large language model performance using ROCm

Boosting Llama 4 Inference Performance with AMD Instinct MI300X GPUs
Learn how to boost your Llama 4 inference performance on AMD MI300X GPUs using AITER-optimized kernels and advanced vLLM techniques

Beyond Text: Accelerating Multimodal AI Inference with Speculative Decoding on AMD Instinct™ MI300X GPUs
This blog shows you how to speedup your multimodal models with AMD’s open-source PyTorch tools for speculative decoding on MI300X GPUs

Installing ROCm from source with Spack
Install ROCm and PyTorch from source using Spack. Learn how to optimize builds, manage dependencies, and streamline your GPU software stacks.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics