Developers Blogs#

GEMM Tuning within hipBLASLt– Part 2
Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.

Running SwinUNETR on AMD MI300X GPUs
Learn how to setup, run and optimize SwinUNETR on AMD MI300X GPUs for fast medical imaging 3D segmentation of tumors using fast, large ROIs.

Optimizing Drug Discovery Tools on AMD MI300s Part 2: 3D Molecular Generation with SemlaFlow
Learn how to set up, run, and optimize SemlaFlow, a molecular generation tool, on AMD MI300X GPUs for faster drug discovery workflows

Elevating 3D Scene Rendering with GSplat
ROCm Port of GSplat - GPU accelerated rasterization of Gaussian splatting

ROCm 7.0: An AI-Ready Powerhouse for Performance, Efficiency, and Productivity
Discover how ROCm 7.0 integrates AI across every layer, combining hardware enablement, frameworks, model support, and a suite of optimized tools

Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware
Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors

Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.

ROCm Revisited: Getting Started with HIP
New to HIP? This blog will introduce you to the HIP runtime API, its key concepts and installation and practical code examples to showcase its functionality.

Optimizing Drug Discovery Tools on AMD MI300X Part 1: Molecular Design with REINVENT
Learn how to set up, run, and optimize REINVENT4, a molecular design tool, on AMD MI300X GPUs for faster drug discovery workflows

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning
A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.

Introducing Instella-Math: Fully Open Language Model with Reasoning Capability
Instella-Math is AMD’s 3B reasoning model, trained on 32 MI300X GPUs with open weights, optimized for logic, math, and chain-of-thought tasks.

AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation
We present AMD Hummingbird, offering a two-stage distillation framework for efficient, high-quality text-to-video generation using compact models.

Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture
This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4

GEMM Tuning within hipBLASLt - Part 1
We introduce a hipBLASLt tuning tool that lets developers optimize GEMM problem sizes and integrate them into the library.

AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs
AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs
Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics