Posts by Andy Luo

Best practices for competitive inference optimization on AMD Instinct™ MI300X GPUs

Optimizing LLM performance on GPUs is challenging due to diverse model needs, memory constraints, and balancing latency and throughput. This document examines how hardware utilization, memory and communication bandwidth and scaling, contribute to inference performance, detailing optimal configurations for AMD Instinct™ MI300X GPUs.

Read more ...