AI Blogs#

AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation
We present AMD Hummingbird, offering a two-stage distillation framework for efficient, high-quality text-to-video generation using compact models.

GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
AMD introduces GEAK, an AI agent for generating optimized Triton GPU kernels, achieving up to 63% accuracy and up to 2.59× speedups on MI300X GPUs.

Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs
This blog provides a how-to guide on installing and programming with Taichi Lang on AMD Instinct GPUs.

Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware
Accelerate Graph Deep Learning on AMD GPUs with DGL and ROCm—scale efficiently with open tools and optimized performance.

Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework
This blog shows how CK-Tile’s XOR-based swizzle optimizes shared memory access in GEMM kernels on AMD GPUs by eliminating LDS bank conflicts

Benchmarking Reasoning Models: From Tokens to Answers
Learn how to benchmark reasoning tasks. Use Qwen3 and vLLM to test true reasoning performance, not just how fast words are generated.

Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU
Fine-tune Llama 3.2 Vision models on AMD MI300X GPU using Torchtune, achieving 2.3× better accuracy with 11B vs 90B model on chart-based tasks.

Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs
Explore Instella-T2I: AMD’s open-source text-to-image model, built on MI300X GPUs with novel tokenizer and LLM-based encoder for scalable image generation.

Fine-tuning Robotics Vision Language Action Models with AMD ROCm and LeRobot
Speed up robotics AI with AMD ROCm and LeRobot: fine-tune VLAs on Instinct GPUs and deploy on Ryzen AI. Follow the tutorial to get started.

Accelerating Video Generation on ROCm with Unified Sequence Parallelism: A Practical Guide
A practical guide for accelerating video generation with HunyuanVideo and Wan 2.1 using Unified Sequence Parallelism on AMD GPUs.

Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day
Nitro-T is a family of text-to-image diffusion models developed by AMD to demonstrate efficient large-scale training on Instinct™ MI300X GPUs. Trained from scratch in under 24 hours

vLLM V1 Meets AMD Instinct GPUs: A New Era for LLM Inference Performance
vLLM v1 on AMD ROCm boosts LLM serving with faster TTFT, higher throughput, and optimized multimodal support—ready out of the box.