Posts by Christian Gilli
Deep Dive Into 4-Wave Interleave FP8 GEMM
- 27 May 2026
Our previous two posts in this GEMM optimization series covered Matrix Core instructions and 8-wave ping-pong FP8 GEMM design. Here we discuss another algorithm design introduced by HipKittens - 4-wave interleave, which further improves the performance of the 8-wave ping-pong implementation. For the most complete understanding, we recommend reading this post alongside the source code.