Posts tagged Developers
ROCm Revisited: Evolution of the High-Performance GPU Computing Ecosystem
- 06 June 2025
This blog is part of our ROCm Revisited series[1]. The purpose of this series is to share the story of ROCm and our journey through the changes and successes we’ve achieved over the past few years. We’ll explore the key milestones in our development, the innovative technologies that have propelled us forward, and the challenges we’ve overcome to establish our leadership in the world of GPU computing.
HIP 7.0 Is Coming: What You Need to Know to Stay Ahead
- 28 May 2025
At AMD, we understand that code portability between AMD and NVIDIA GPU programming models is top of mind for our customers. We are committed to making GPU development more seamless and portable across vendors. With the upcoming HIP 7.0 release in second half of 2025, we’re taking a bold step toward simplifying cross-platform programming by aligning HIP C++ even more closely with CUDA. AMD tightly integrates our automatic HIPIFY conversion tool with our HIP runtime and compiler. Users can quickly port CUDA code into HIP C++ with HIPIFY to target AMD GPUs. However, small differences between our implementation of the HIP C++ programming model and CUDA C++ often require manual intervention to adjust your code base. This causes additional work for software developers targeting GPU families from both providers. We understand this and are making changes to ROCm to reduce this friction based on customer requests. We also know adopting changes in our programming model requires early notification. We don’t take API breaking changes lightly and for your benefit, we are making an early prototype available to assist in porting to the new HIP 7.0 API. The preview release is based on ROCm 6.4 release for functionality but contains 7.0 API previews. It is intended as a drop-in replacement for 6.4 intended for non-production use, enabling users to write code with the new API and adopt HIP 7.0 more smoothly. In this blog, you will learn how HIP 7.0 aligns more closely with CUDA, what API and behavior changes to expect, and how to prepare your codebase to ensure compatibility and portability across GPU platforms. Let’s delve into the details of the API changes.
ROCm Runfile Installer Is Here!
- 22 May 2025
From ROCm 6.4, and after much user demand, we are introducing the ROCm Runfile Installer method primarily for network secured environments, or those who wish to bypass a native Linux package management system, or those that just want to download and run a single file to install ROCm.
From Theory to Kernel: Implement FlashAttention-v2 with CK-Tile
- 21 May 2025
In our previous blog, Hands on with CK Tile we walked through how to build a basic GEMM kernel using CK-Tile. In this blog, we will further explore the implementation of a fused kernel, specifically introducing the FlashAttention (FA)-v2 forward kernel. Figure 1 provides an overview of the FlashAttention kernel executions and data movements that occur during the computation of a single thread block of output matrix. Each of the subsequent sections explains details on how to implement this using CK-Tile.
Introducing ROCm-DS: GPU-Accelerated Data Science for AMD Instinct™ GPUs
- 20 May 2025
AMD is excited to announce the early access release of ROCm-DS (ROCm Data Science), a new toolkit designed to accelerate data processing workloads on AMD Instinct™ GPUs. Built on the core ROCm toolkit, ROCm-DS promises to significantly enhance performance and scalability for data-intensive applications, catering to the pressing needs of today’s data-driven landscape. ROCm-DS is based on the open source libraries in the RAPIDS ecosystem. This collection of libraries enables a multitude of data processing operations, allowing new and existing workloads to tap into the computational advantages offered by AMD Instinct Datacenter GPUs. This early access release introduces two powerful new libraries: hipDF and hipGRAPH.
Programming AMD GPUs with Julia
- 16 April 2024
Julia is a high-level, general-purpose dynamic programming language that automatically compiles to efficient native code via LLVM, and supports multiple platforms. With LLVM, comes the support for programming GPUs, including AMD GPUs.