Software tools & optimizations - Page 4

Software tools & optimizations - Page 4#

Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

September 05, 2025

GEMM Tuning within hipBLASLt - Part 1

We introduce a hipBLASLt tuning tool that lets developers optimize GEMM problem sizes and integrate them into the library.

./software-tools-optimization/hipblaslt-offline-tuning-part1/README.html

August 28, 2025

Unleashing AMD Instinct™ MI300X GPUs for LLM Serving: Disaggregating Prefill & Decode with SGLang

Learn how prefill–decode disaggregation improves LLM inference by reducing latency, enhancing throughput, and optimizing resource usage.

./software-tools-optimization/disaggregation/README.html

August 25, 2025

AITER-Enabled MLA Layer Inference on AMD Instinct MI300X GPUs

AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

./software-tools-optimization/aiter-mla/README.html

August 22, 2025

Primus: A Lightweight, Unified Training Framework for Large Models on AMD GPUs

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

./software-tools-optimization/primus/README.html

August 19, 2025

Running ComfyUI on AMD Instinct

This blog shows how to deploy ComfyUI on AMD Instinct GPUs. The blog explains what ComfyUI is and how it works.

./software-tools-optimization/comfyui-on-amd/README.html

August 13, 2025

Performance Profiling on AMD GPUs – Part 2: Basic Usage

Part 2 of our GPU profiling series guides beginners through practical steps to identify and optimize kernel bottlenecks using ROCm tools

./software-tools-optimization/profiling-guide/novice/README.html

August 07, 2025

Running ComfyUI in Windows with ROCm on WSL

Run ComfyUI on Windows with ROCm and WSL to harness Radeon GPU power for local AI tasks like Stable Diffusion—no dual-boot needed

./software-tools-optimization/rocm-on-wsl/README.html

August 01, 2025

GEAK: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

AMD introduces GEAK, an AI agent for generating optimized Triton GPU kernels, achieving up to 63% accuracy and up to 2.59× speedups on MI300X GPUs.

./software-tools-optimization/triton-kernel-ai/README.html

July 25, 2025

Avoiding LDS Bank Conflicts on AMD GPUs Using CK-Tile Framework

This blog shows how CK-Tile’s XOR-based swizzle optimizes shared memory access in GEMM kernels on AMD GPUs by eliminating LDS bank conflicts

./software-tools-optimization/lds-bank-conflict/README.html

July 21, 2025

Chain-of-Thought Guided Visual Reasoning Using Llama 3.2 on a Single AMD Instinct MI300X GPU

Fine-tune Llama 3.2 Vision models on AMD MI300X GPU using Torchtune, achieving 2.3× better accuracy with 11B vs 90B model on chart-based tasks.

./software-tools-optimization/fine-tune-llama3.2/README.html

July 18, 2025

Announcing hipCIM: A Cutting-Edge Solution for Accelerated Multidimensional Image Processing

Fully utilize the power of AMDs Instinct GPUs to process and interpret detailed multidimensional images with lightning speed.

./software-tools-optimization/hipcim-intro/README.html

July 18, 2025

Introducing ROCm-LS: Accelerating Life Science Workloads with AMD Instinct™ GPUs

Accelerate life science and medical workloads with ROCm-LS, AMDs GPU-optimized toolkit for faster multidimensional image processing and vision.

./software-tools-optimization/rocm-ls-intro/README.html

Prev Page 4 of 9 Next