George Wang

George Wang#

George Wang is Director of AI Software Product Engineering in the AI Group at AMD, where he leads a talented team for AI software solutions, product management, and end-to-end performance optimizations across Data Center, Client, and Edge/Endpoint applications, driving cutting-edge AI capabilities for AMD’s customers, developers, and the broader community.

George with over 20 years of experience in the technology industry, and he holds master’s degree from UESTC.

Posts by George Wang

Kimi-K2-Instruct: Enhanced Out-of-the-Box Performance on AMD Instinct MI355 Series GPUs

Learn how AMD Instinct MI355 Series GPUs deliver competitive Kimi-K2 inference with faster TTFT, lower latency, and strong throughput.

October 16, 2025 by Wei Cai, Fan Wu, George Wang

Optimizing FP4 Mixed-Precision Inference with Petit on AMD Instinct MI250 and MI300 GPUs: A Developer’s Perspective

Learn how FP4 mixed-precision on AMD GPUs boosts inference speed and integrates seamlessly with SGLang.

October 06, 2025 by Haohui Mai, Charles Yang, George Wang

Step-3 Deployment Simplified: A Day 0 Developer’s Guide on AMD Instinct™ GPUs

Learn how to deploy Step-3, a 321B-parameter VLM with MFA & AFD, on AMD Instinct™ GPUs to cut decoding costs and boost long-context reasoning

September 04, 2025 by George Wang, Ning Zhang

AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

August 25, 2025 by Daniel Huang, George Wang

Day 0 Developer Guide: Running the Latest Open Models from OpenAI on AMD AI Hardware

Day 0 support across our AI hardware ecosystem from our flagship AMD InstinctTM MI355X and MI300X GPUs, AMD Radeon™ AI PRO R700 GPUs and AMD Ryzen™ AI Processors

August 05, 2025 by Andy Luo, Shekhar Pandey, Hongxia Yang, Mahdi Ghodsi, Charles Yang, Niles Burbank, George Wang, Kailash Gogineni, Xun Wang, Zhenyu Gu, Yao Fu, Yanyuan Qin, Anshul Gupta

Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

This blog shows how CK-Tile’s XOR-based swizzle optimizes shared memory access in GEMM kernels on AMD GPUs by eliminating LDS bank conflicts

July 25, 2025 by Haocong Wang, Clement Lin, Meng-Hsuan Yang, Yu-Chen Lin, Bobo Fang, Chun-Hung Wang, David Li, George Wang, Anshul Gupta

Vibe Coding Pac-Man Inspired Game with DeepSeek-R1 and AMD Instinct MI300X

Learn LLM-powered game dev using DeepSeek-R1 on AMD MI300X GPUs with iterative prompting, procedural generation, and VS Code AI tools

July 17, 2025 by Charles Yang, Mahdi Ghodsi, George Wang

Fine-Tuning LLMs with GRPO on AMD MI300X: Scalable RLHF with Hugging Face TRL and ROCm

Fine-tune LLMs with GRPO on AMD MI300X—leverage ROCm, Hugging Face TRL, and vLLM for efficient reasoning and scalable RLHF

June 18, 2025 by Zhu Shan, George Wang

From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs

May 21, 2025 by Haocong Wang, Kevin Chang, David Li, George Wang

Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

Boost DeepSeek-R1 with AITER: Step-by-step SGLang integration for high-performance MoE, GEMM, and attention ops on AMD GPUs

May 16, 2025 by Bruce Xue, George Wang

Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs

Learn how to accelerate text-to-video generation using Step-Video-T2V, a 30B parameter T2V model, on AMD MI300X GPUs with ROCm—enabling scalable, high-fidelity video generation from text

May 15, 2025 by Wei Cai, George Wang

Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility

May 06, 2025 by Lei Zhang, George Wang, Fan Wu, Peng Sun, Kyle Wang, Anshul Gupta

Hands-On with CK-Tile: Develop and Run Optimized GEMM on AMD GPUs

Build high-performance GEMM kernels using CK-Tile on AMD Instinct GPUs with vendor-optimized pipelines and policies for AI and HPC workloads

April 15, 2025 by David Li, George Wang

Unlock Peak Performance on AMD GPUs with Triton Kernel Optimizations

Learn how Triton compiles and optimizes AI kernels on AMD GPUs, with deep dives into IR flows, hardware-specific passes, and performance tuning tips

April 10, 2025 by Ning Zhang, George Wang

GEMM Kernel Optimization For AMD GPUs

Guide to how GEMMs can be tuned for optimal performance of AI models on AMD GPUs

February 06, 2025 by Ning Zhang, George Wang, Anshul Gupta