Posts tagged Profiling

Performance Profiling on AMD GPUs – Part 1: Foundations

26 June 2025

Error parsing meta tag attribute “keywords”: No content.

Read more ...

ROCm 6.4: Breaking Barriers in AI, HPC, and Modular GPU Software

11 April 2025

In the rapidly evolving landscape of high-performance computing and artificial intelligence, innovation is the currency of progress. AMD’s ROCm 6.4 isn’t just another software update—it’s a leap forward that redefines the boundaries of what is possible for AI, developers, researchers, and enterprise innovators.

Read more ...

Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025

Profiling is the backbone of performance optimization in AI and HPC workloads, enabling developers to extract maximum efficiency from AMD Instinct™ GPUs. With ROCm’s rapid evolution, the need for a unified, scalable, and extensible profiling framework has never been more critical. The new ROCprofiler-SDK framework represents a significant step forward in profiling capabilities, offering enhanced features, streamlined integration, and a better user experience while also solving past limitations with former profiler interface versions. This guide aims to help users seamlessly transition from legacy profiling tools to the ROCprofiler-SDK infrastructure. We will explore new features, highlight key differences from previous tools, and provide actionable steps for a smooth migration.

Read more ...

Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

25 March 2025

Profiling is the backbone of performance optimization in AI and HPC workloads, enabling developers to extract maximum efficiency from AMD Instinct™ GPUs. With ROCm’s rapid evolution, the need for a unified, scalable, and extensible profiling framework has never been more critical. The new ROCprofiler-SDK framework represents a significant step forward in profiling capabilities, offering enhanced features, streamlined integration, and a better user experience while also solving past limitations with former profiler interface versions. This guide aims to help users seamlessly transition from legacy profiling tools to the ROCprofiler-SDK infrastructure. We will explore new features, highlight key differences from previous tools, and provide actionable steps for a smooth migration.

Read more ...

Seismic stencil codes - part 3

29 August 2024

12 Aug, 2024 by

and .

Read more ...

Seismic stencil codes - part 2

29 August 2024

12 Aug, 2024 by

and .

Read more ...

Seismic stencil codes - part 1

29 August 2024

12 Aug, 2024 by

and .

Read more ...

Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

22 July 2024

This blog provides a comprehensive guide on measuring and comparing the performance of various algorithms in a JAX-implemented generative AI model. Leveraging the JAX Profiler and statistical analysis, this blog demonstrates how to reliably evaluate key steps and compare algorithm performance on AMD GPUs.

Read more ...

TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

18 June 2024

TensorFlow Profiler consists of a set of tools designed to measure resource utilization and performance during the execution of TensorFlow models. It offers insights into how a model interacts with hardware resources, including execution time and memory usage. TensorFlow Profiler helps in pinpointing performance bottlenecks, allowing us to fine-tune the execution of models for improved efficiency and faster outcomes which can be crucial in scenarios where near-real-time predictions are required.

Read more ...

AMD in Action: Unveiling the Power of Application Tracing and Profiling

07 May 2024

Rocprof is a robust tool designed to analyze and optimize the performance of HIP programs on AMD ROCm platforms, helping developers pinpoint and resolve performance bottlenecks. Rocprof provides a variety of profiling data, including performance counters, hardware traces, and runtime API/activity traces.

Read more ...

Jacobi Solver with HIP and OpenMP offloading

15 September 2023

15 Sept, 2023 by

, , .

Read more ...

Finite difference method - Laplacian part 4

18 July 2023

18 Jul, 2023 by

, , .

Read more ...

Finite difference method - Laplacian part 3

11 May 2023

11 May, 2023 by

, , , , .

Read more ...

Introduction to profiling tools for AMD hardware

12 April 2023

Getting a code to be functionally correct is not always enough. In many industries, it is also required that applications and their complex software stack run as efficiently as possible to meet operational demands. This is particularly challenging as hardware continues to evolve over time, and as a result codes may require further tuning. In practice, many application developers construct benchmarks, which are carefully designed to measure the performance, such as execution time, of a particular code within an operational-like setting. In other words: a good benchmark should be representative of the real work that needs to be done. These benchmarks are useful in that they provide insight into the characteristics of the application, and enables one to discover potential bottlenecks that could result in performance degradation during operational settings.

Read more ...

Finite difference method - Laplacian part 2

04 January 2023

4 Jan, 2023 by

, , , , .

Read more ...

Finite difference method - Laplacian part 1

14 November 2022

14 Nov, 2022 by

, , , , .

Read more ...