Software tools & optimizations - Page 2

Software tools & optimizations - Page 2#

Discover the latest blogs about ROCm software tools, libraries, and performance optimizations to help you get the most out of your AMD hardware.

Gumiho boosts LLM inference with early-token accuracy, blending serial + parallel decoding for speed, accuracy, and ROCm-optimized deployment.

Learn how to use hipblaslt-bench for offline GEMM tuning in hipBLASLt—benchmark, save, and apply custom-tuned kernels at runtime.

October 09, 2025 by Chia Hung, YangWen Huang, Carson Liao

ROCm Port of GSplat - GPU accelerated rasterization of Gaussian splatting

What’s New in AMD GPU Operator: Learn About GPU Partitioning and New Kubernetes Features

October 01, 2025 by Alireza Sariaslani

This blog post explains how to use Matrix Cores on CDNA3 and CDNA4 architecture, with a focus on low-precision data types such as FP16, FP8, and FP4

Primus streamlines training on AMD ROCm, from fine-tuning to massive pretraining on MI300X GPUs—faster, safer, and easier to debug

This blog will show you how to speed up LLM inference with Multi-Token Prediction in DeepSeek V3 & SGLang on AMD Instinct GPUs

We introduce a hipBLASLt tuning tool that lets developers optimize GEMM problem sizes and integrate them into the library.

September 05, 2025 by YangWen Huang, Carson Liao

Learn how prefill–decode disaggregation improves LLM inference by reducing latency, enhancing throughput, and optimizing resource usage.

August 28, 2025 by Bill He, Andy Luo

AITER boosts DeepSeek-V3’s MLA on AMD MI300X GPUs with low-rank projections, shared KV paths & matrix absorption for 2× faster inference.

August 25, 2025 by Daniel Huang, George Wang

Primus streamlines LLM training on AMD GPUs with unified configs, multi-backend support, preflight validation, and structured logging.

This blog shows how to deploy ComfyUI on AMD Instinct GPUs. The blog explains what ComfyUI is and how it works.

August 19, 2025 by Albin Toft

Prev Page 2 of 7 Next