Posts by Tong Shen

Nitro-E: A 304M Diffusion Transformer Model for High Quality Image Generation

24 October 2025

We present Nitro-E, an extremely lightweight diffusion transformer model for high quality image generation. With just 304M parameters, Nitro-E is designed to be resource-friendly for both training and inference. For training, it only takes 1.5 days on a single node with 8 AMD Instinct™ MI300X GPUs. On the inference side, Nitro-E delivers a throughput of 18.8 samples per second (batch size 32, 512px images) on a single AMD Instinct MI300X GPU. The distilled version can further increase the throughput to 39.3 samples per second. On a consumer iGPU device Strix Halo, our model can generate a 512px image in only 0.16 seconds. A more detailed comparison can be seen in Figure 1, where our models achieve promising scores on GenEval while showing clearly higher throughputs.

Read more ...

Nitro-T: Training a Text-to-Image Diffusion Model from Scratch in 1 Day

09 July 2025

AMD is excited to release Nitro-T, a family of text-to-image diffusion models focused on highly efficient training. Our models achieve competitive scores on image generation benchmarks compared to previous models focused on efficient training while requiring less than 1 day of training from scratch on 32 AMD Instinct MI300X GPUs.

Read more ...