Posts tagged Profiling

Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

Profiling is the backbone of performance optimization in AI and HPC workloads, enabling developers to extract maximum efficiency from AMD Instinct™ GPUs. With ROCm’s rapid evolution, the need for a unified, scalable, and extensible profiling framework has never been more critical. The new ROCprofiler-SDK framework represents a significant step forward in profiling capabilities, offering enhanced features, streamlined integration, and a better user experience while also solving past limitations with former profiler interface versions. This guide aims to help users seamlessly transition from legacy profiling tools to the ROCprofiler-SDK infrastructure. We will explore new features, highlight key differences from previous tools, and provide actionable steps for a smooth migration.

Read more ...


Introducing ROCprofiler SDK - The Latest Toolkit for Performance Profiling

Profiling is the backbone of performance optimization in AI and HPC workloads, enabling developers to extract maximum efficiency from AMD Instinct™ GPUs. With ROCm’s rapid evolution, the need for a unified, scalable, and extensible profiling framework has never been more critical. The new ROCprofiler-SDK framework represents a significant step forward in profiling capabilities, offering enhanced features, streamlined integration, and a better user experience while also solving past limitations with former profiler interface versions. This guide aims to help users seamlessly transition from legacy profiling tools to the ROCprofiler-SDK infrastructure. We will explore new features, highlight key differences from previous tools, and provide actionable steps for a smooth migration.

Read more ...


Seismic stencil codes - part 3

12 Aug, 2024 by

and .

Read more ...


Seismic stencil codes - part 2

12 Aug, 2024 by

and .

Read more ...


Seismic stencil codes - part 1

12 Aug, 2024 by

and .

Read more ...


Using statistical methods to reliably compare algorithm performance in large generative AI models with JAX Profiler on AMD GPUs

This blog provides a comprehensive guide on measuring and comparing the performance of various algorithms in a JAX-implemented generative AI model. Leveraging the JAX Profiler and statistical analysis, this blog demonstrates how to reliably evaluate key steps and compare algorithm performance on AMD GPUs.

Read more ...


TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

TensorFlow Profiler consists of a set of tools designed to measure resource utilization and performance during the execution of TensorFlow models. It offers insights into how a model interacts with hardware resources, including execution time and memory usage. TensorFlow Profiler helps in pinpointing performance bottlenecks, allowing us to fine-tune the execution of models for improved efficiency and faster outcomes which can be crucial in scenarios where near-real-time predictions are required.

Read more ...


AMD in Action: Unveiling the Power of Application Tracing and Profiling

Rocprof is a robust tool designed to analyze and optimize the performance of HIP programs on AMD ROCm platforms, helping developers pinpoint and resolve performance bottlenecks. Rocprof provides a variety of profiling data, including performance counters, hardware traces, and runtime API/activity traces.

Read more ...


Jacobi Solver with HIP and OpenMP offloading

15 Sept, 2023 by

, , .

Read more ...


Finite difference method - Laplacian part 4

18 Jul, 2023 by

, , .

Read more ...


Finite difference method - Laplacian part 3

11 May, 2023 by

, , , , .

Read more ...


Introduction to profiling tools for AMD hardware

Getting a code to be functionally correct is not always enough. In many industries, it is also required that applications and their complex software stack run as efficiently as possible to meet operational demands. This is particularly challenging as hardware continues to evolve over time, and as a result codes may require further tuning. In practice, many application developers construct benchmarks, which are carefully designed to measure the performance, such as execution time, of a particular code within an operational-like setting. In other words: a good benchmark should be representative of the real work that needs to be done. These benchmarks are useful in that they provide insight into the characteristics of the application, and enables one to discover potential bottlenecks that could result in performance degradation during operational settings.

Read more ...


Finite difference method - Laplacian part 2

4 Jan, 2023 by

, , , , .

Read more ...


Finite difference method - Laplacian part 1

14 Nov, 2022 by

, , , , .

Read more ...