Posts by Logan Grado

Accelerating models on ROCm using PyTorch TunableOp

03 July 2024

In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.

Read more ...

ResNet for image classification using AMD GPUs

09 April 2024

9 Apr, 2024 by

.

Read more ...

Scale AI applications with Ray

01 April 2024

1, Apr 2024 by

Logan Grado, {hoverxref}Eliot Li.

Read more ...

Automatic mixed precision in PyTorch using AMD GPUs

29 March 2024

As models increase in size, the time and memory needed to train them–and consequently, the cost–also increases. Therefore, any measures we take to reduce training time and memory usage can be highly beneficial. This is where Automatic Mixed Precision (AMP) comes in.

Read more ...