Posts by Joe Shajrawi
Accelerating IBM Granite 4.0 with FP8 using AMD Quark on MI300/MI355 GPUs
- 09 January 2026
In this post, we demonstrate how AMD Quark, a high-performance quantization library optimized for AMD Instinctâ„¢ MI300 and MI355 GPUs, enables FP8 quantization to deliver excellent accuracy retention and substantial throughput uplift for the IBM Granite 4.0 model family. For instructions on deploying Granite 4.0 on AMD GPUs, please refer to the previous blog post.
AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving
- 20 May 2025
AMD has successfully deployed the open-source llm-d framework on AMD Kubernetes infrastructure as part of our efforts for distributed large language model inference at scale. It leverages Kubernetes-native toolkit to streamline LLM serving with features like KV-cache-aware routing, distributed scheduling, and integration with Inference Gateway (IGW). In this blog we showcase initial deployment on an AMD cluster with distributed prefill and decode stages on a Llama model.