Posts by Ye Hur Cheong

The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism

24 November 2025

Deploying large Mixture-of-Experts (MoE) models like DeepSeek-R1 efficiently isn’t just about having enough GPUs—it’s about choosing the right parallelism strategy. The wrong choice can lead to duplicated KV caches consuming 8× your memory, or communication overhead that cuts throughput in half. The right choice unlocks significantly better performance for your specific workload.

Read more ...