Posts tagged PyTorch
DBRX Instruct on AMD GPUs
- 11 July 2024
In this blog, we showcase DBRX Instruct, a mixture-of-experts large language model developed by Databricks, on a ROCm-capable system with AMD GPUs.
Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm
- 11 July 2024
PyTorch 2.0 introduces torch.compile()
, a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile
delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.
Accelerating models on ROCm using PyTorch TunableOp
- 03 July 2024
In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.
A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs
- 02 July 2024
2 July, 2024 by Douglas Jia.
Mamba on AMD GPUs with ROCm
- 28 June 2024
28, Jun 2024 by Sean Song, Jassani Adeem, Moskvichev Arseny.
Unveiling performance insights with PyTorch Profiler on an AMD GPU
- 29 May 2024
29 May, 2024 by Phillip Dang.
Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs
- 23 May 2024
23, May 2024 by Vara Lakshmi Bayanagari.
Accelerating Large Language Models with Flash Attention on AMD GPUs
- 15 May 2024
15, May 2024 by Clint Greene.
Multimodal (Visual and Language) understanding with LLaVA-NeXT
- 26 April 2024
26, Apr 2024 by Phillip Dang.
Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model
- 24 April 2024
24 Apr, 2024 by Sean Song.
Transforming Words into Motion: A Guide to Video Generation with AMD GPU
- 24 April 2024
24 Apr, 2024 by Douglas Jia.
Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs
- 16 April 2024
16 Apr, 2024 by Douglas Jia.
GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment
- 11 April 2024
11 Apr, 2024 by Douglas Jia.
Using the ChatGLM-6B bilingual language model with AMD GPUs
- 04 April 2024
4, Apr 2024 by Phillip Dang.
Total body segmentation using MONAI Deploy on an AMD GPU
- 04 April 2024
4, Apr 2024 by Vara Lakshmi Bayanagari.
Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs
- 23 February 2024
23 Feb, 2024 by Douglas Jia.
Simplifying deep learning: A guide to PyTorch Lightning
- 08 February 2024
8, Feb 2024 by Phillip Dang.
Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU
- 07 February 2024
7, Feb 2024 by Vara Lakshmi Bayanagari.
Using LoRA for efficient fine-tuning: Fundamental principles
- 05 February 2024
5, Feb 2024 by Sean Song.
Pre-training BERT using Hugging Face & PyTorch on an AMD GPU
- 26 January 2024
26, Jan 2024 by Vara Lakshmi Bayanagari.
Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs
- 24 January 2024
24 Jan, 2024 by Douglas Jia.
Creating a PyTorch/TensorFlow code environment on AMD GPUs
- 11 September 2023
Note: This blog was previously part of the AMD lab notes blog series.