Posts by Ye Hur Cheong
The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism
- 24 November 2025
Deploying large Mixture-of-Experts (MoE) models like DeepSeek-R1 efficiently isn’t just about having enough GPUs—it’s about choosing the right parallelism strategy. The wrong choice can lead to duplicated KV caches consuming 8× your memory, or communication overhead that cuts throughput in half. The right choice unlocks significantly better performance for your specific workload.