Haocong Wang

Haocong Wang#

Haocong

Haocong is member of Composable Kernel team, he is technical leader of kernel performance optimization. He contributed to AMDGPU and ROCm from RDNA2 and CDNA1 to RDNA4 and CDNA4 since 2022.

Haocong specializes in developing and optimizing AMD GPU operator performance. He led the development and optimization of multi-precision matrix multiplication operators on the MI300 platform, significantly boosting AMD GPU performance for AI inference workloads. Dedicated to unlocking hardware potential through high-performance operators using Composable Kernel (CK), Haocong accelerates user workloads to deliver tangible customer value. His research interests focus on creating sustainable and versatile GPU programming paradigms, abstracting complex hardware concepts, and optimizing interactions with compilers.

Posts by Haocong Wang