Recent Posts - Page 11#

Seismic stencil codes - part 2

Seismic Stencil Codes - Part 2: In the previous post, recall that the kernel with stencil computation in the z-direction suffered from low effective bandwidth. This low performance comes from generating substantial amounts of data to movement to global memory.

August 29, 2024 by Justin Chang, Ossian O'Reilly

Seismic stencil codes - part 3

Seismic Stencil Codes - Part 3: In the last two blog posts, we developed a HIP kernel capable of computing high order finite differences commonly needed in seismic wave propagation.

August 29, 2024 by Justin Chang, Ossian O'Reilly

Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

August 28, 2024 by Meena Arunachalam, Miro Hodak, Jeremy Arnold, Eliot Li

Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

August 21, 2024 by Eliot Li

Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

Time series forecasting (TSF) predicts future behavior using past data. This guide focuses on implementing Transformers for TSF, covering preprocessing to evaluation using AMD hardware.

August 19, 2024 by Fabricio Flores

Inferencing with Grok-1 on AMD GPUs

We demonstrate that the massive Grok-1 Model from xAI can run seamlessly on the AMD MI300X GPU accelerator by leveraging the ROCm software platform.

August 09, 2024 by Eliot Li, Luise Chen, Lei Shao

Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

In this blog we explore how to fine-tune the Robustly Optimized BERT Pretraining Approach RoBERTa large language model, with emphasis on PyTorch's mixed precision capabilities. Specifically, we explore using AMD GPUs for mixed precision fine-tuning to achieve faster model training without any major impacts on accuracy.

July 29, 2024 by Fabricio Flores