Posts tagged Inference
DP Attention and TBO for DeepSeek-V4 on MI355X
- 24 June 2026
Running DeepSeek-V4 efficiently requires solving two intertwined problems: how to parallelize MoE communication across GPUs, and how to hide that communication behind useful compute. The dominant approach is Expert Parallel with all2all backends like DeepEP. This solves both problems, but it also requires specialized kernels, topology assumptions, and careful expert placement.