Posts by Liz Li

Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers

AI based text-to-image generation is pushing the boundaries of creative and visual storytelling, enabling the critical mass to draw like an artist. Stability AI introduced stable diffusion models which was a breakthrough in text to image generation. However, FLUX - a new state-of-the-art open-source model released by Black Forest Labs, is gaining popularity for its flexibility and controllability.

Read more ...


Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

Our previous blog post on this topic discussed how DeepSeek-R1 achieves competitive performance on AMD Instinct™ MI300X GPUs. We also included performance comparisons against Nvidia H200 GPUs and a short demo application illustrating real-world usage. In this blog we will delve into how using the SGLang framework, critical kernel optimizations like AI Tensor Engine for ROCm™, and hyperparameter tuning helps to achieve performance boosts.

Read more ...


AITER: AI Tensor Engine For ROCm

Performance optimization is critical when working with GPUs, especially for tasks involving artificial intelligence, which can be extremely demanding. To fully leverage the capabilities of advanced hardware, it’s essential to master optimization strategies and ensure every available resource is utilized efficiently. In this blog we will provide an overview of AMD’s AI Tensor Engine for ROCm (AITER) and show you how easy it is to integrate AITER kernels in basic LLM training and inference workload. AITER helps developers to focus on creating operators while allowing customers to seamlessly integrate this operator collection into their own private, public, or any custom framework.

Read more ...


AITER: AI Tensor Engine For ROCm

Performance optimization is critical when working with GPUs, especially for tasks involving artificial intelligence, which can be extremely demanding. To fully leverage the capabilities of advanced hardware, it’s essential to master optimization strategies and ensure every available resource is utilized efficiently. In this blog we will provide an overview of AMD’s AI Tensor Engine for ROCm (AITER) and show you how easy it is to integrate AITER kernels in basic LLM training and inference workload. AITER helps developers to focus on creating operators while allowing customers to seamlessly integrate this operator collection into their own private, public, or any custom framework.

Read more ...