Posts by Eveline Chen

Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 & W8A8 Quantization with AMD Quark

In our previous blog [7], we demonstrated how to accelerate Kimi-K2.5 [1] inference on AMD Instinct™ GPUs by profiling the model, identifying fused_moe as the dominant bottleneck (consuming 88–90% of GPU time), and replacing the default Triton-based kernel with a FlyDSL [2]-powered mixed-precision (BF16 + W4A16) fused MoE implementation.

Read more ...


Accelerating Kimi-K2.5 on AMD Instinct™ MI300X: Optimizing Fused MoE with FlyDSL

With the recent surge in popularity of OpenClaw [1], its officially recommended model, Kimi-K2.5 [2], has taken the AI community by storm. As developers and researchers flock to this powerful Mixture-of-Experts (MoE) LLM, the need for high-performance inference on cutting-edge hardware has never been more critical.

Read more ...