AMD ROCm Blogs

DBRX Instruct on AMD GPUs

2024-07-11T00:00:00+00:00

In this blog, we showcase DBRX Instruct, a mixture-of-experts large language model developed by Databricks, on a ROCm-capable system with AMD GPUs.

Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

2024-07-11T00:00:00+00:00

PyTorch 2.0 introduces torch.compile(), a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.

Accelerating models on ROCm using PyTorch TunableOp

2024-07-03T00:00:00+00:00

In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.

A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

2024-07-02T00:00:00+00:00

A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs #AI/ML #GenAI #JAX #PyTorch #LLM

Mamba on AMD GPUs with ROCm

2024-06-28T00:00:00+00:00

28, Jun 2024 by Sean Song, Jassani Adeem, Moskvichev Arseny.

Deep Learning Recommendation Models on AMD GPUs

2024-06-28T00:00:00+00:00

28, June 2024 by Phillip Dang.

TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

2024-06-18T00:00:00+00:00

TensorFlow Profiler consists of a set of tools designed to measure resource utilization and performance during the execution of TensorFlow models. It offers insights into how a model interacts with hardware resources, including execution time and memory usage. TensorFlow Profiler helps in pinpointing performance bottlenecks, allowing us to fine-tune the execution of models for improved efficiency and faster outcomes which can be crucial in scenarios where near-real-time predictions are required.

Stone Ridge Expands Reservoir Simulation Options with AMD Instinct™ Accelerators

2024-06-10T00:00:00+00:00

Stone Ridge Technology (SRT) pioneered the use of GPUs for high performance reservoir simulation (HPC) nearly a decade ago with ECHELON, its flagship software product. ECHELON, the first of its kind, engineered from the outset to harness the full potential of massively parallel GPUs, stands apart in the industry for its power, efficiency, and accuracy. Now, ECHELON has added support for AMDInstinct accelerators into its simulation engine, offering new flexibility and optionality to its clients.

Segment Anything with AMD GPUs

2024-06-04T00:00:00+00:00

4 Jun, 2024 by Sean Song.

SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

2024-05-31T00:00:00+00:00

The AMD ROCm™ Composable Kernel (CK) library provides a programming model for writing performance-critical kernels for machine learning workloads. It generates a general-purpose kernel during the compilation phase through a C++ template, enabling developers to achieve operation fusions on different data precisions.