Recent Posts - Page 12#
High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide
Learn how to optimize BERT-L training with mixed precision and Flash Attention v2 on AMD Instinct GPUs — follow our tested MLPerf-compliant step-by-step guide.
Scale LLM Inference with Multi-Node Infrastructure
Learn how to horizontally scale LLM inference using open-source tools on MI300X, with vLLM, nginx, Prometheus, and Grafana.
HIP 7.0 Is Coming: What You Need to Know to Stay Ahead
Get ready for HIP 7.0—explore key API changes that boost CUDA compatibility and streamline portable GPU development, start preparing your code today.
ROCm Runfile Installer Is Here!
Overview of ROCm Runfile Installer introduced in ROCm 6.4, allowing a complete single package for driver and ROCm installation without internet connectivity
From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile
Learn how to implement FlashAttention-v2 with CK-Tile: minimize memory overhead, maximize compute efficiency, and scale on AMD GPUs
AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving
AMD Integrates llm-d on AMD Instinct MI300X Cluster For Distributed LLM Serving
Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
Accelerate data science with ROCm-DS: AMD’s GPU-optimized toolkit for faster data frames and graph analytics using hipDF and hipGRAPH
Accelerate DeepSeek-R1 Inference: Integrate AITER into SGLang
Boost DeepSeek-R1 with AITER: Step-by-step SGLang integration for high-performance MoE, GEMM, and attention ops on AMD GPUs
Step-Video-T2V Inference with xDiT on AMD Instinct MI300X GPUs
Learn how to accelerate text-to-video generation using Step-Video-T2V, a 30B parameter T2V model, on AMD MI300X GPUs with ROCm—enabling scalable, high-fidelity video generation from text
Accelerated JPEG decoding on AMD Instinct™ GPUs with rocJPEG
Learn how to decompress JPEG files at breakneck speeds for your AI, vision, and content delivery workloads using rocJPEG and AMD Instinct GPUs.
DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs
This blog post demonstrates how hipDF significantly enhances and accelerates data manipulation, aggregation, and transformation tasks on AMD hardware using ROCm.
CuPy and hipDF on AMD: The Basics and Beyond
Learn how to deploy CuPy and hipDF on AMD GPUs. See their high-performance computing advantages, and use CuPy and hipDF in a detailed example of an investment portfolio allocation optimization using the Markowitz model.