Recent Posts#
GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs
Introducing GEAK Family - AI-driven agents that automate GPU kernel optimization for AMD Instinct GPUs with hardware-aware feedback
GEAK HIP: Expanding GEAK for HIP Code Optimization
Explore the GEAK frameworks AI-driven HIP code optimization for improved performance on AMD GPUs, including speedup examples and benefits for AI workloads.
Getting Started with AMD AI Workbench: Deploying and Managing AI Workloads
Learn how to deploy and manage AI workloads with AMD AI Workbench, a low-code interface for developers to manage AI inference deployments
A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs
Learn how to train LLMs across decentralized clusters on AMD Instinct MI300 GPUs with DiLoCo and Prime—scale beyond one datacenter.
MoE Training Best Practices on AMD GPUs
Learn how to optimize Mixture-of-Experts (MoE) model training on AMD Instinct GPUs with ROCm. Maximize your AI training performance now!
3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat
3D Scene Reconstruction from the Inside: Explore the Mathematics Behind gsplat
Accelerating llama.cpp on AMD Instinct MI300X
Learn more about the superior performance of llama.cpp on Instinct platforms.
Medical Imaging on MI300X: SwinUNETR Inference Optimization
A practical guide to optimizing SwinUNETR inference on AMD Instinct™ MI300X GPUs for fast 3D segmentation of tumors in medical imaging.
Accelerating Autonomous Driving Model Training on AMD ROCm™ Software
Learn how to deploy AMD GPUs for high-performance autonomous driving related model training with ROCm optimization.
Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs
Explore how MI355X performs against B200 in vLLM benchmarks across DeepSeek-R1, GPT-OSS-120B, Qwen3-235B and Llama-3.3-70B.
Building a State-of-the-Art 32 Billion Reasoning Model with Only Synthetic Data on AMD GPUs
Learn how to build a State-of-the-art reasoning model that beats Qwen3-32B using only synthetic data and SFT on AMD Instinct™ GPUs—fast, simple, and scalable.
DGL in Depth: SE(3)-Transformer on ROCm 7
Inform the AI community about running SE(3)-Transformer with DGL on AMD Instinct platforms.