Posts by Liz Li
Bring FLUX to Life on MI300X: Run and Optimize with Hugging Face Diffusers
- 28 March 2025
AI based text-to-image generation is pushing the boundaries of creative and visual storytelling, enabling the critical mass to draw like an artist. Stability AI introduced stable diffusion models which was a breakthrough in text to image generation. However, FLUX - a new state-of-the-art open-source model released by Black Forest Labs, is gaining popularity for its flexibility and controllability.
Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X
- 21 March 2025
Our previous blog post on this topic discussed how DeepSeek-R1 achieves competitive performance on AMD Instinct™ MI300X GPUs. We also included performance comparisons against Nvidia H200 GPUs and a short demo application illustrating real-world usage. In this blog we will delve into how using the SGLang framework, critical kernel optimizations like AI Tensor Engine for ROCm™, and hyperparameter tuning helps to achieve performance boosts.
AITER: AI Tensor Engine For ROCm
- 21 March 2025
Performance optimization is critical when working with GPUs, especially for tasks involving artificial intelligence, which can be extremely demanding. To fully leverage the capabilities of advanced hardware, it’s essential to master optimization strategies and ensure every available resource is utilized efficiently. In this blog we will provide an overview of AMD’s AI Tensor Engine for ROCm (AITER) and show you how easy it is to integrate AITER kernels in basic LLM training and inference workload. AITER helps developers to focus on creating operators while allowing customers to seamlessly integrate this operator collection into their own private, public, or any custom framework.
AITER: AI Tensor Engine For ROCm
- 21 March 2025
Performance optimization is critical when working with GPUs, especially for tasks involving artificial intelligence, which can be extremely demanding. To fully leverage the capabilities of advanced hardware, it’s essential to master optimization strategies and ensure every available resource is utilized efficiently. In this blog we will provide an overview of AMD’s AI Tensor Engine for ROCm (AITER) and show you how easy it is to integrate AITER kernels in basic LLM training and inference workload. AITER helps developers to focus on creating operators while allowing customers to seamlessly integrate this operator collection into their own private, public, or any custom framework.