Posts tagged GenAI
Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm
- 11 July 2024
PyTorch 2.0 introduces torch.compile()
, a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile
delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.
Accelerating models on ROCm using PyTorch TunableOp
- 03 July 2024
In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.
A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs
- 02 July 2024
2 July, 2024 by Douglas Jia.
Mamba on AMD GPUs with ROCm
- 28 June 2024
28, Jun 2024 by Sean Song, Jassani Adeem, Moskvichev Arseny.
Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model
- 24 April 2024
24 Apr, 2024 by Sean Song.
Transforming Words into Motion: A Guide to Video Generation with AMD GPU
- 24 April 2024
24 Apr, 2024 by Douglas Jia.
Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU
- 16 April 2024
16, Apr 2024 by Sean Song.
Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs
- 16 April 2024
16 Apr, 2024 by Douglas Jia.
Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU
- 15 April 2024
15, Apr 2024 by Sean Song.
Building semantic search with SentenceTransformers on AMD
- 04 April 2024
4 Apr, 2024 by Fabricio Flores.
Scale AI applications with Ray
- 01 April 2024
1, Apr 2024 by Vicky Tsang<vicktsan>, {hoverxref}Logan Grado, {hoverxref}
Eliot Li
Large language model inference optimizations on AMD GPUs
- 15 March 2024
15, Mar 2024 by Seungrok Jung.
Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs
- 23 February 2024
23 Feb, 2024 by Douglas Jia.
Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU
- 07 February 2024
7, Feb 2024 by Vara Lakshmi Bayanagari.
Using LoRA for efficient fine-tuning: Fundamental principles
- 05 February 2024
5, Feb 2024 by Sean Song.
Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering
- 01 February 2024
1, Feb 2024 by Sean Song.
Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU
- 29 January 2024
29, Jan 2024 by Vara Lakshmi Bayanagari.
Pre-training BERT using Hugging Face & PyTorch on an AMD GPU
- 26 January 2024
26, Jan 2024 by Vara Lakshmi Bayanagari.
Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs
- 24 January 2024
24 Jan, 2024 by Douglas Jia.
Efficient deployment of large language models with Text Generation Inference on AMD GPUs
- 24 January 2024
24 Jan, 2024 by Douglas Jia.