Posts by Tun Jian Tan
ROCm Becomes a First-Class Platform in the vLLM Ecosystem
- 21 January 2026
As the generative AI ecosystem matures, vLLM embraces a multivendor ecosystem. The quality of support across hardware platforms becomes a defining priority: developers expect consistent, high-performance behavior no matter which GPU they choose. Today, we are proud to announce a major realization of that vision: AMD ROCm™ is now a first-class platform in the vLLM ecosystem.
Accelerating Multimodal Inference in vLLM: The One-Line Optimization for Large Multimodal Models
- 02 January 2026
Deploying multimodal models like Qwen3-VL or InternVL at scale reveals a hidden bottleneck. While Tensor Parallelism (TP) is essential for massive language decoders, it is often overkill for vision encoders. These encoders are typically small, often just 1-5% of total model size, so there is limited compute benefit from sharding them. However, they still incur expensive all-reduce communication costs after every single layer.
The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism
- 24 November 2025
Deploying large Mixture-of-Experts (MoE) models like DeepSeek-R1 efficiently isn’t just about having enough GPUs—it’s about choosing the right parallelism strategy. The wrong choice can lead to duplicated KV caches consuming 8× your memory, or communication overhead that cuts throughput in half. The right choice unlocks significantly better performance for your specific workload.
Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm
- 28 June 2025
AMD is pleased to announce the release of vLLM 0.9.x, delivering significant advances in LLM inference performance through ROCm™ software and AITER integration. This release provides a variety of powerful optimizations and exciting new capabilities to the AMD ROCm software ecosystem as shown in Figure 1, below. Whether you are a developer or a researcher, this release is designed to help you unlock new levels of performance and explore wider model support on AMD Instinct™ GPUs.