Developers Blogs#

AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs
AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning
A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs
Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

Introducing Instella-Math: Fully Open Language Model with Reasoning Capability
Instella-Math is AMD’s 3B reasoning model, trained on 32 MI300X GPUs with open weights, optimized for logic, math, and chain-of-thought tasks.

Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware
Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors

Unlocking GPU-Accelerated Containers with the AMD Container Toolkit
Simplify GPU acceleration in containers with the AMD Container Toolkit—streamlined setup, runtime hooks, and full ROCm integration.

ROCm Revisited: Getting Started with HIP
New to HIP? This blog will introduce you to the HIP runtime API, its key concepts and installation and practical code examples to showcase its functionality.

ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem
Learn how ROCm evolved to support HPC, AI, and containerized workloads with modern tools, libraries, and deployment options.

AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation
We present AMD Hummingbird, offering a two-stage distillation framework for efficient, high-quality text-to-video generation using compact models.

Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X
Learn LLM-powered game dev using DeepSeek-R1 on AMD MI300X GPUs with iterative prompting, procedural generation, and VS Code AI tools

Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot
Speed up robotics AI with AMD ROCm and LeRobot: fine-tune VLAs on Instinct GPUs and deploy on Ryzen AI. Follow the tutorial to get started.

Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs
Using Statistical Methods to Reliably Compare Algorithm Performance in Large Generative AI Models with JAX Profiler on AMD GPUs

GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
AMD introduces GEAK, an AI agent for generating optimized Triton GPU kernels, achieving up to 63% accuracy and up to 2.59× speedups on MI300X GPUs.

ROCm Runfile Installer Is Here!
Overview of ROCm Runfile Installer introduced in ROCm 6.4, allowing a complete single package for driver and ROCm installation without internet connectivity

From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile
Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs

Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH
Stay informed
- Subscribe to our RSS feed (Requires an RSS reader available as browser plugins.)
- Signup for the ROCm newsletter
- View our blog statistics