Posts by Seungrok Jung

Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

Our previous blog post on this topic discussed how DeepSeek-R1 achieves competitive performance on AMD Instinct™ MI300X GPUs. We also included performance comparisons against Nvidia H200 GPUs and a short demo application illustrating real-world usage. In this blog we will delve into how using the SGLang framework, critical kernel optimizations like AI Tensor Engine for ROCm™, and hyperparameter tuning helps to achieve performance boosts.

Read more ...


Large language model inference optimizations on AMD GPUs

15, Mar 2024 by

.

Read more ...