Posts by Chun Fang

Fast Image Generation and Editing with SGLang Diffusion on AMD GPUs

10 July 2026

Visual generative AI is advancing at an extraordinary pace. OpenAI’s GPT Image 2, released in April 2026, is a reasoning-enabled image model that performs an internal planning process before generating images. Its predecessor generated over 700 million images within just its first week after launch. Google’s Nano Banana (Gemini 3.1 Flash Image) delivers real-time, knowledge-grounded image generation and editing, while open-source video models such as HunyuanVideo can now produce fluid, high-fidelity clips from a single sentence.

Read more ...

Accelerating Large-Scale LLM Inference on AMD Instinct MI350X/MI355X with Eagle3 and AMD Quark

03 July 2026

Large language model (LLM) inference is increasingly constrained by autoregressive decoding. Even when prefill is highly optimized, the decode phase still generates tokens one step at a time, and each step typically requires running the full target model. For large mixture-of-experts and attention-heavy models such as Kimi-K2.5 and MiniMax-M2.5, this sequential pattern limits serving throughput and increases latency for real-time applications.

Read more ...

Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs

08 December 2025

Today, we are excited to share Large Language Model (LLM) Inference Performance with vLLM on AMD Instinct™ MI355X GPUs. Whether you are a startup, an enterprise or a hyperscaler, the AMD open software ecosystem with Instinct MI355X GPUs delivers consistent, high-performance inference at scale outperforming Nvidia Blackwell B200 GPUs as concurrency grows. For real-world users, this performance impact is directly proportional to user experience and cost efficiency in production environments.

Read more ...