Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs

Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs#

March 09, 2026 by Yayuan Wang, Daniel Huang, George Wang.

6 min read. | 1432 total words.

Applications & models

GenAI, Diffusion Model, Installation, PyTorch, Performance, Optimization, Memory

AI, Developers

ComfyUI has become a widely adopted and versatile node-based interface for Stable Diffusion and other generative AI models, gaining significant traction within the AI content creation community. Unlike traditional web-based interfaces, ComfyUI provides a node-based workflow system that gives users complete control over their image and video generation pipelines. Its modular architecture allows for complex workflows involving multiple models, LoRAs, ControlNets, and custom processing steps.

The challenge: Many AMD GPU users have encountered stability issues when running ComfyUI, including HIP memory errors, slow first-time generation, and VAE decoding failures. These issues can be frustrating for users transitioning from NVIDIA hardware or setting up their first AI generation workflow.

With the release of AMD Radeon RX 9000 series GPUs based on the RDNA 4 architecture, along with continued improvements in the AMD ROCm™ software stack, AMD users can now leverage ComfyUI for AI generation workflows. In this blog, we will discuss how to set up ComfyUI on AMD Radeon hardware, address the most common issues encountered by the community, and share performance optimization strategies based on our hands-on testing.

Note

This blog focuses on native Linux setups with AMD Radeon consumer GPUs. If you are interested in running ComfyUI on AMD Instinct data center GPUs, see Running ComfyUI on AMD Instinct. For a Windows-based setup using WSL, see Running ComfyUI in Windows with ROCm on WSL.

Why ComfyUI?#

ComfyUI offers several advantages for AI content creators:

Visual Workflow Editor: Design complex generation pipelines through an intuitive node-based interface
Memory Efficiency: Advanced model management that enables running large models on consumer hardware
Extensibility: Rich ecosystem of custom nodes and community extensions
Reproducibility: Save and share workflows as JSON files for consistent results
Multi-Model Support: Native support for Stable Diffusion, SDXL, Flux, WAN video models, and more

As shown in Figure 1 below, ComfyUI’s template browser makes it easy to get started with a variety of generation tasks.

ComfyUI Template Community Figure 1: ComfyUI’s built-in template browser provides ready-to-use workflows for various generation types including Image, Video, Audio, 3D Model, and LLM applications.

System Requirements#

Hardware Requirements#

Component	Minimum	Recommended
GPU	AMD Radeon RX 7000 Series (RDNA3)	AMD Radeon RX 9000 Series (RDNA4)
VRAM	8GB	16GB+
System RAM	16GB	32GB+
Storage	50GB free space	SSD with 100GB+ free space

Software Requirements#

Component	Version
Operating System	Ubuntu 22.04 LTS or Ubuntu 24.04 LTS
ROCm	7.1
Python	3.10 - 3.12
PyTorch	2.6.0+ with ROCm support

Supported GPU Models#

The following AMD Radeon GPUs have been tested with this configuration:

AMD Radeon RX 9070 XT (RDNA4)
AMD Radeon RX 9070 (RDNA4)
AMD Radeon RX 7900 XTX (RDNA3)
AMD Radeon RX 7900 XT (RDNA3)
AMD Radeon RX 7900 GRE (RDNA3)

Installation Guide#

Option 1: Docker Installation (Recommended)#

Docker provides an isolated environment with pre-configured ROCm support, eliminating many common installation issues.

Pull the ROCm PyTorch image:

docker pull rocm/pytorch:rocm7.1_ubuntu24.04_py3.12_pytorch_release_2.6.0

Launch the container:

WORKSPACE_DIR="$HOME/comfyui_workspace"
mkdir -p $WORKSPACE_DIR

docker run -it \
  --name comfyui-rocm \
  --device=/dev/kfd \
  --device=/dev/dri \
  --group-add video \
  --group-add render \
  --ipc=host \
  --network=host \
  -v $WORKSPACE_DIR:/workspace \
  -w /workspace \
  rocm/pytorch:rocm7.1_ubuntu24.04_py3.12_pytorch_release_2.6.0 \
  /bin/bash

Install ComfyUI inside the container:

cd /workspace
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Launch ComfyUI:
```
python main.py --listen 0.0.0.0
```

Access the web interface at http://localhost:8188.

Option 2: Native Installation#

For users who prefer a native installation without Docker:

Install ROCm by following the official ROCm installation guide for your distribution: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/

Create a Python environment:

python3 -m venv comfyui_env
source comfyui_env/bin/activate

Install PyTorch with ROCm support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.1

Clone and install ComfyUI:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
python main.py --listen 0.0.0.0

Common Issues and Solutions on AMD Radeon#

This section addresses the most frequently encountered issues when running ComfyUI on AMD Radeon GPUs, based on community feedback and our extensive testing.

Issue 1: Out of Memory (OOM) Errors#

Symptoms:

“CUDA out of memory” or “HIP out of memory” errors during generation
Process crashes during model loading or VAE decoding
Errors appear more frequently with larger models (SDXL, Flux, WAN 2.2)

Root Cause: AMD GPUs require careful VRAM management. Unlike NVIDIA GPUs, the memory allocation behavior can differ, and some operations may require more headroom.

Solutions:

Reserve VRAM for system operations by launching ComfyUI with the --reserve-vram flag:
```
python main.py --listen 0.0.0.0 --reserve-vram 3
```
This reserves 3GB of VRAM for system operations and PyTorch overhead, preventing allocation failures.

Enable low VRAM mode for GPUs with 8-12GB VRAM:

python main.py --listen 0.0.0.0 --lowvram

Disable smart memory management if you experience inconsistent memory behavior:
```
python main.py --listen 0.0.0.0 --disable-smart-memory
```

Issue 2: Slow First Generation (MIOpen Compilation)#

Symptoms:

First image generation takes significantly longer than subsequent generations
Terminal shows compilation messages during initial run
Performance improves dramatically after the first generation

Root Cause: MIOpen (the AMD equivalent to cuDNN) compiles optimized kernels for your specific GPU on first use. This compilation is cached for future runs.

Expected Behavior:

Generation	Time (SDXL 1024x1024, 20 steps)
First run	~9 seconds
Subsequent runs	~5 seconds

Solutions:

Be patient: The first run includes one-time optimization. Subsequent runs will be faster.
Preserve MIOpen cache by ensuring the cache directory persists between sessions:
```
export MIOPEN_USER_DB_PATH=$HOME/.cache/miopen
```

Issue 3: VAE Decoding Failures#

Symptoms:

Errors during the final VAE decode step
“Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding” warning
Black or corrupted output images

Root Cause: VAE decoding requires significant contiguous VRAM. When VRAM is fragmented or nearly full, the standard decode may fail.

Solutions:

Use tiled VAE nodes: ComfyUI automatically falls back to tiled VAE when memory is insufficient. This is normal behavior and produces identical results.
Force CPU VAE as a last resort:
```
python main.py --listen 0.0.0.0 --cpu-vae
```
Note: This significantly slows VAE operations but ensures completion.

Use FP32 VAE for precision issues:

python main.py --listen 0.0.0.0 --fp32-vae

Issue 4: HIP Illegal Memory Access Errors#

Symptoms:

“HIP error: an illegal memory access was encountered”
Random crashes after several successful generations
More frequent with continuous batch processing

Root Cause: This issue has been addressed in recent ComfyUI versions through automatic backend optimization. The latest ComfyUI automatically sets torch.backends.cudnn.enabled = False for AMD GPUs, which significantly improves stability.

Solutions:

Update ComfyUI to the latest version:

cd ComfyUI
git pull
pip install -r requirements.txt

Verify automatic optimization by checking your terminal output for:
```
Set: torch.backends.cudnn.enabled = False for better AMD performance.
```
If you see this message, the optimization is active.
For older versions, manually disable cuDNN by creating a startup script:
```
import torch
torch.backends.cudnn.enabled = False
```

Issue 5: Multiple GPU Configuration#

Symptoms:

ComfyUI uses the wrong GPU
Multi-GPU systems not utilizing all cards

Solutions:

Specify GPU device:

HIP_VISIBLE_DEVICES=0 python main.py --listen 0.0.0.0

For multi-GPU workflows: ComfyUI primarily uses a single GPU. For parallel workflows, consider running multiple ComfyUI instances on different ports.

Recommended Launch Configurations#

Based on our testing, we recommend the following configurations:

For 16GB VRAM GPUs (RX 9070 XT, RX 7900 XTX)#

python main.py --listen 0.0.0.0 --reserve-vram 3

For 12GB VRAM GPUs#

python main.py --listen 0.0.0.0 --reserve-vram 2 --lowvram

For Maximum Stability#

python main.py --listen 0.0.0.0 --reserve-vram 3 --disable-smart-memory

Environment Variables for Optimization#

export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
export HSA_ENABLE_SDMA=0

Performance Benchmarks#

The following benchmarks were conducted on an AMD Radeon AI PRO R9700 (32GB VRAM) with ROCm 7.1:

Note: Performance may vary based on VRAM capacity. Users with 16GB VRAM GPUs may experience slightly different results, particularly with larger models.

SDXL Performance#

Configuration	First Run	Subsequent Runs	Inference Speed
1024x1024, 20 steps	9.02s	~5.2s	4.6 it/s

WAN 2.2 Video Generation (5B Model)#

Metric	Value
Per-frame generation	~25 seconds
VAE behavior	Automatic tiled decode

Stability Test Results#

Test	Result
12 consecutive SDXL generations	12/12 successful
HIP errors encountered	0

Example Workflow Results#

SDXL Image Generation#

The following demonstrates a typical SDXL workflow in ComfyUI. Figure 2 shows the node-based pipeline, and Figure 3 shows the generation output with performance metrics.

Figure 2: SDXL workflow showing the node-based pipeline with model loading, prompt encoding, sampling, and VAE decoding stages.

SDXL Generation Demo Figure 3: SDXL generation results showing high-quality 1024x1024 output. The logs panel displays real-time performance metrics: 20/20 steps completed at 4.52 it/s, with total execution time of 5.23 seconds.

WAN 2.2 Video Generation#

ComfyUI also supports video generation workflows using the WAN 2.2 model. Figure 4 shows the text-to-video pipeline, and Figure 5 shows the resulting output.

WAN 2.2 Workflow Figure 4: WAN 2.2 video generation workflow with text-to-video pipeline configuration.

WAN 2.2 Generation Demo Figure 5: WAN 2.2 video generation result showing a 1024x1024 video output. The workflow successfully generates smooth video content from text prompts.

Troubleshooting Checklist#

If you encounter issues, verify the following:

ROCm installation:
```
rocm-smi --showproductname
```
Confirm your GPU is detected.

PyTorch ROCm support:

import torch
print(torch.cuda.is_available())  # Should be True
print(torch.version.hip)  # Should show HIP version

ComfyUI version:
```
cd ComfyUI
git log -1 --oneline
```
Ensure you’re on a recent commit.
Memory status:
```
rocm-smi --showmeminfo vram
```
Monitor VRAM usage during generation.

Additional Resources#

ComfyUI Official Documentation: https://docs.comfy.org/
ComfyUI GitHub Repository: comfyanonymous/ComfyUI
ROCm Documentation: https://rocm.docs.amd.com/
AMD Community Forums: https://community.amd.com/
Running ComfyUI on AMD Instinct: https://rocm.blogs.amd.com/software-tools-optimization/comfyui-on-amd/README.html
Running ComfyUI in Windows with ROCm on WSL: https://rocm.blogs.amd.com/software-tools-optimization/rocm-on-wsl/README.html

Summary#

ComfyUI on AMD Radeon RX 9000 series GPUs delivers a robust and performant experience for AI image and video generation. With ROCm 7.1 and the latest ComfyUI optimizations, many historical stability issues have been resolved. By following the configurations outlined in this guide, users can achieve reliable, high-quality generation results.

The key takeaways are:

Use the latest ComfyUI version - It includes automatic AMD optimizations
Reserve VRAM appropriately - Use --reserve-vram 3 for 16GB+ cards
First run will be slower - MIOpen compilation is a one-time process
VAE tiled mode is normal - It ensures completion under memory pressure

Ready to get started? Download ComfyUI from the official GitHub repository, follow the installation steps above, and start creating. If you encounter any issues not covered in this guide, visit the AMD Community Forums or open an issue on the ComfyUI GitHub repository.

Disclaimers#

Third-party content is licensed to you directly by the third party that owns the content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.

Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs

Contents

Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs#

Why ComfyUI?#

System Requirements#

Hardware Requirements#

Software Requirements#

Supported GPU Models#

Installation Guide#

Option 1: Docker Installation (Recommended)#

Option 2: Native Installation#

Common Issues and Solutions on AMD Radeon#

Issue 1: Out of Memory (OOM) Errors#

Issue 2: Slow First Generation (MIOpen Compilation)#

Issue 3: VAE Decoding Failures#

Issue 4: HIP Illegal Memory Access Errors#

Issue 5: Multiple GPU Configuration#

Recommended Launch Configurations#

For 16GB VRAM GPUs (RX 9070 XT, RX 7900 XTX)#

For 12GB VRAM GPUs#

For Maximum Stability#

Environment Variables for Optimization#

Performance Benchmarks#

SDXL Performance#

WAN 2.2 Video Generation (5B Model)#

Stability Test Results#

Example Workflow Results#

SDXL Image Generation#

WAN 2.2 Video Generation#

Troubleshooting Checklist#

Additional Resources#

Summary#

Disclaimers#