Recent Posts - Page 22#

May 28, 2025

HIP 7.0 Is Coming: What You Need to Know to Stay Ahead

Get ready for HIP 7.0—explore key API changes that boost CUDA compatibility and streamline portable GPU development, start preparing your code today.

./ecosystems-and-partners/transition-to-hip-7.0-blog/README.html

May 22, 2025

ROCm Runfile Installer Is Here!

Overview of ROCm Runfile Installer introduced in ROCm 6.4, allowing a complete single package for driver and ROCm installation without internet connectivity

./software-tools-optimization/amd-rocm-runfile/README.html

May 21, 2025

From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile

Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs

./software-tools-optimization/ck-tile-flash/README.html

May 20, 2025

Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs

Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH

./software-tools-optimization/introducing-rocm-ds-revolutionizing-data-processing-with-amd-instinct-gpus/README.html

May 20, 2025

AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving

./artificial-intelligence/llm-d-distributed/README.html

May 16, 2025

Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang

Boost DeepSeek-R1 with AITER: Step-by-step SGLang integration for high-performance MoE, GEMM, and attention ops on AMD GPUs

./artificial-intelligence/aiter-intergration-s/README.html

May 15, 2025

Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs

Learn how to accelerate text-to-video generation using Step-Video-T2V, a 30B parameter T2V model, on AMD MI300X GPUs with ROCm—enabling scalable, high-fidelity video generation from text

./artificial-intelligence/step-video-t2v/README.html

May 12, 2025

Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG

Learn how to decompress JPEG files at breakneck speeds for your AI, vision, and content delivery workloads using rocJPEG and AMD Instinct GPUs.

./artificial-intelligence/rocjpeg-decoding-performance-blog/README.html

May 07, 2025

DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs

This blog post demonstrates how hipDF significantly enhances and accelerates data manipulation, aggregation, and transformation tasks on AMD hardware using ROCm.

./artificial-intelligence/hipDF_pandas_accelerated/README.html

May 06, 2025

Unleash Full GPU Potential: Overlap Communication and Computation with Triton-Distributed

Unlock the full power of AMD GPUs—write portable, efficient kernels with Triton-Distributed, overlapping computation and communication with ease and flexibility

./software-tools-optimization/triton-distributed-c/README.html

May 06, 2025

CuPy and hipDF on AMD: The Basics and Beyond

Learn how to deploy CuPy and hipDF on AMD GPUs. See their high-performance computing advantages, and use CuPy and hipDF in a detailed example of an investment portfolio allocation optimization using the Markowitz model.

./artificial-intelligence/cupy_hipdf_portfolio_opt/README.html

May 01, 2025

Optimizing DeepseekV3 Inference on SGLang Using ROCm Profiling Tools

Dive into kernel-level profiling of DeepseekV3 on SGLang—identify GPU bottlenecks and boost large language model performance using ROCm

./software-tools-optimization/kernel-analysis-deep/README.html

Prev Page 22 of 36 Next