Posts by Joe Shajrawi

Accelerating IBM Granite 4.0 with FP8 using AMD Quark on MI300/MI355 GPUs

In this post, we demonstrate how AMD Quark, a high-performance quantization library optimized for AMD Instinctâ„¢ MI300 and MI355 GPUs, enables FP8 quantization to deliver excellent accuracy retention and substantial throughput uplift for the IBM Granite 4.0 model family. For instructions on deploying Granite 4.0 on AMD GPUs, please refer to the previous blog post.

Read more ...


AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

AMD has successfully deployed the open-source llm-d framework on AMD Kubernetes infrastructure as part of our efforts for distributed large language model inference at scale. It leverages Kubernetes-native toolkit to streamline LLM serving with features like KV-cache-aware routing, distributed scheduling, and integration with Inference Gateway (IGW). In this blog we showcase initial deployment on an AMD cluster with distributed prefill and decode stages on a Llama model.

Read more ...