Posts by Ziqiong Liu

GEAK V3: Agent-Driven, Repository-Level GPU Kernel Optimization across HIP, Triton, and FlyDSL on AMD GPUs

20 July 2026

In the ever-evolving world of GPU computing, optimizing kernels for performance and efficiency is a critical challenge. Hand-tuning kernels demands deep technical expertise and manual iteration. In this blog, you will read about how GEAK v3, the latest iteration of the agent-driven framework, tackles this problem using enhanced features such as task planning, test-harness discovery, patch-based handling of multi-file kernels, dynamic memory system and expert knowledge database. Our results show improvements across three kernel languages (HIP, Triton, and FlyDSL) and both CDNA and RDNA GPUs.

Read more ...

GEAK Agent-Driven Optimization of the DeepSeekV4 MLA Kernel

13 July 2026

Optimizing LLM inference kernels requires more than a single kernel rewrite. Developers need to migrate reference implementations, analyze profiling results, validate correctness, and iterate quickly across different workload shapes. In this blog, we use DeepSeekV4 MLA as a case study to show how GEAK automates this workflow, from PyTorch-to-Triton migration to kernel-level optimization and SGLang end-to-end(E2E) validation on AMD GPUs.

Read more ...

Streamlining Recommendation Model Training on AMD Instinct™ GPUs

02 March 2026

Recommendation model training and inference workloads represent a significant portion of computational requirements across industries including e-commerce, social media and content streaming platforms. Unlike LLMs, recommendation models result in to complex and often imbalanced communication across GPUs, along with a higher load on the CPU-GPU interconnect. The ROCm training docker [1] now includes essential libraries for recommendation model training. This blog demonstrates the functionality and ease of training recommendation models using ROCm, along with suggestions for improved configuration of these workloads. We also highlight the inherent benefits of the large HBM size on AMD Instinct™ GPUs for recommendation workloads.

Read more ...

GEAK-Triton v2 Family of AI Agents: Kernel Optimization for AMD Instinct GPUs

23 December 2025

Optimizing GPU kernels is a formidable task, traditionally requiring deep domain expertise and hours of manual tuning. At AMD, we are expanding our GEAK: Generating Efficient AI-centric GPU Kernels family to automate this entire workflow, from initial code generation to deep performance optimization.

Read more ...

GEAK HIP: Expanding GEAK for HIP Code Optimization

23 December 2025

This blog discusses the use of the Generating Efficient AI-centric Kernels (GEAK) agent for automated HIP code optimization, demonstrating how GEAK’s agentic pipelines can elevate customer and developer code and boost AI performance on AMD platforms.

Read more ...

GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

01 August 2025

At AMD, we are pioneering ways to accelerate AI development using AI itself, by generating accurate and efficient GPU kernels. Specifically, we are starting with the automatic generation of kernels in Triton, an open-source Python-like language for writing parallel programming code for GPUs. Today, AMD is excited to announce (a) Generating Efficient AI-centric Kernels (GEAK) for AMD GPUs, and results on (b) two Triton kernel evaluation benchmarks, where we show how AI agents can perform inference-time scaling with frontier LLMs to generate accurate and efficient kernels for AMD Instinct™ GPUs like MI250X and MI300X.

Read more ...