Posts by Ziqiong Liu

Streamlining Recommendation Model Training on AMD Instinct™ GPUs

Recommendation model training and inference workloads represent a significant portion of computational requirements across industries including e-commerce, social media and content streaming platforms. Unlike LLMs, recommendation models result in to complex and often imbalanced communication across GPUs, along with a higher load on the CPU-GPU interconnect. The ROCm training docker [1] now includes essential libraries for recommendation model training. This blog demonstrates the functionality and ease of training recommendation models using ROCm, along with suggestions for improved configuration of these workloads. We also highlight the inherent benefits of the large HBM size on AMD Instinct™ GPUs for recommendation workloads.

Read more ...


GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

Optimizing GPU kernels is a formidable task, traditionally requiring deep domain expertise and hours of manual tuning. At AMD, we are expanding our GEAK: Generating Efficient AI-centric GPU Kernels family to automate this entire workflow, from initial code generation to deep performance optimization.

Read more ...


GEAK HIP: Expanding GEAK for HIP Code Optimization

This blog discusses the use of the Generating Efficient AI-centric Kernels (GEAK) agent for automated HIP code optimization, demonstrating how GEAK’s agentic pipelines can elevate customer and developer code and boost AI performance on AMD platforms.

Read more ...


GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

At AMD, we are pioneering ways to accelerate AI development using AI itself, by generating accurate and efficient GPU kernels. Specifically, we are starting with the automatic generation of kernels in Triton, an open-source Python-like language for writing parallel programming code for GPUs. Today, AMD is excited to announce (a) Generating Efficient AI-centric Kernels (GEAK) for AMD GPUs, and results on (b) two Triton kernel evaluation benchmarks, where we show how AI agents can perform inference-time scaling with frontier LLMs to generate accurate and efficient kernels for AMD Instinct™ GPUs like MI250X and MI300X.

Read more ...