Posts by Kenny Roche
ROCm Becomes a First-Class Platform in the vLLM Ecosystem
- 21 January 2026
As the generative AI ecosystem matures, vLLM embraces a multivendor ecosystem. The quality of support across hardware platforms becomes a defining priority: developers expect consistent, high-performance behavior no matter which GPU they choose. Today, we are proud to announce a major realization of that vision: AMD ROCm™ is now a first-class platform in the vLLM ecosystem.
AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving
- 20 May 2025
AMD has successfully deployed the open-source llm-d framework on AMD Kubernetes infrastructure as part of our efforts for distributed large language model inference at scale. It leverages Kubernetes-native toolkit to streamline LLM serving with features like KV-cache-aware routing, distributed scheduling, and integration with Inference Gateway (IGW). In this blog we showcase initial deployment on an AMD cluster with distributed prefill and decode stages on a Llama model.