AI - Applications & Models#

QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang
Quick Reduce speeds up LLM inference on AMD Instinct™ MI300X GPUs with inline-compressed all-reduce, cutting comms overhead by up to 3×

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning
A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.

DGL in the Real World: Running GNNs on Real Use Cases
We walk through four advanced GNN workloads from heterogeneous e-commerce graphs to neuroscience applications that we successfully ran using our DGL implementation.

All-in-One Video Editing with VACE on AMD Instinct GPUs
This blog showcases AMD hardware powering cutting-edge text-driven video editing models through an all-in-one solution.

Accelerating FastVideo on AMD GPUs with TeaCache
Enabling ROCm support for FastVideo inference using TeaCache on AMD Instinct GPUs, accelerating video generation with optimized backends

Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU
Fine-tune Wan2.2 for video generation on a single AMD Instinct MI300X GPU with ROCm and DiffSynth.

Introducing Instella-Math: Fully Open Language Model with Reasoning Capability
Instella-Math is AMD’s 3B reasoning model, trained on 32 MI300X GPUs with open weights, optimized for logic, math, and chain-of-thought tasks.

AMD Hummingbird Image to Video: A Lightweight Feedback-Driven Model for Efficient Image-to-Video Generation
We present AMD Hummingbird, offering a two-stage distillation framework for efficient, high-quality text-to-video generation using compact models.

Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs
This blog provides a how-to guide on installing and programming with Taichi Lang on AMD Instinct GPUs.

Graph Neural Networks at Scale: DGL with ROCm on AMD Hardware
Accelerate Graph Deep Learning on AMD GPUs with DGL and ROCm—scale efficiently with open tools and optimized performance.

Benchmarking Reasoning Models: From Tokens to Answers
Learn how to benchmark reasoning tasks. Use Qwen3 and vLLM to test true reasoning performance, not just how fast words are generated.

Instella-T2I: Open-Source Text-to-Image with 1D Tokenizer and 32× Token Reduction on AMD GPUs
Explore Instella-T2I: AMD’s open-source text-to-image model, built on MI300X GPUs with novel tokenizer and LLM-based encoder for scalable image generation.