Recent Posts#

QuickReduce: Up to 3x Faster All-reduce for vLLM and SGLang
Quick Reduce speeds up LLM inference on AMD Instinct™ MI300X GPUs with inline-compressed all-reduce, cutting comms overhead by up to 3×

AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs
AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning
A novel approach that replaces visual tokens with perception-conditioned weights, reducing compute while maintaining strong vision-language performance.

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs
Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

DGL in the Real World: Running GNNs on Real Use Cases
We walk through four advanced GNN workloads from heterogeneous e-commerce graphs to neuroscience applications that we successfully ran using our DGL implementation.

All-in-One Video Editing with VACE on AMD Instinct GPUs
This blog showcases AMD hardware powering cutting-edge text-driven video editing models through an all-in-one solution.

Accelerating FastVideo on AMD GPUs with TeaCache
Enabling ROCm support for FastVideo inference using TeaCache on AMD Instinct GPUs, accelerating video generation with optimized backends

Wan2.2 Fine-Tuning: Tailoring an Advanced Video Generation Model on a Single GPU
Fine-tune Wan2.2 for video generation on a single AMD Instinct MI300X GPU with ROCm and DiffSynth.

Running ComfyUI on AMD Instinct
This blog shows how to deploy ComfyUI on AMD Instinct GPUs. The blog explains what ComfyUI is and how it works.

Performance Profiling on AMD GPUs – Part 2: Basic Usage
Part 2 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools

Introducing Instella-Math: Fully Open Language Model with Reasoning Capability
Instella-Math is AMD’s 3B reasoning model, trained on 32 MI300X GPUs with open weights, optimized for logic, math, and chain-of-thought tasks.

Running ComfyUI in Windows with ROCm on WSL
Run ComfyUI on Windows with ROCm and WSL to harness Radeon GPU power for local AI tasks like Stable Diffusion—no dual-boot needed