Posts by Shreyas Atre

MXFP6 and MXFP4 Mixed Precision for Accelerating Dense LLMs on AMD Instinct MI355X

In this blog, you will learn how pairing MXFP6-E2M3 activations with MXFP4 weights can meaningfully recover accuracy lost to pure 4-bit MXFP4 quantization in specific workloads and configurations, while staying within 2–3% of MXFP4 throughput. You will see measured offline throughput, serving latency, and benchmark accuracy results comparing BF16, FP8, MXFP4, and W_MXFP4_A_MXFP6 on Llama-3.1-8B and Qwen3.6-27B on AMD Instinct MI355X.

Read more ...