Posts tagged GenAI

Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

PyTorch 2.0 introduces torch.compile(), a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.

Read more ...


Accelerating models on ROCm using PyTorch TunableOp

In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.

Read more ...


A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

2 July, 2024 by Douglas Jia.

Read more ...


Mamba on AMD GPUs with ROCm

28, Jun 2024 by Sean Song, Jassani Adeem, Moskvichev Arseny.

Read more ...


Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 Apr, 2024 by Sean Song.

Read more ...


Transforming Words into Motion: A Guide to Video Generation with AMD GPU

24 Apr, 2024 by Douglas Jia.

Read more ...


Inferencing with AI2’s OLMo model on AMD GPU

17 Apr, 2024 by Douglas Jia.

Read more ...


Program Synthesis with CodeGen

16, Apr 2024 by Phillip Dang.

Read more ...


Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

16, Apr 2024 by Sean Song.

Read more ...


Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

16 Apr, 2024 by Douglas Jia.

Read more ...


Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

15, Apr 2024 by Sean Song.

Read more ...


Image classification using Vision Transformer with AMD GPUs

4 Apr, 2024 by Eliot Li.

Read more ...


Building semantic search with SentenceTransformers on AMD

4 Apr, 2024 by Fabricio Flores.

Read more ...


Scale AI applications with Ray

1, Apr 2024 by Vicky Tsang<vicktsan>, {hoverxref}Logan Grado, {hoverxref}Eliot Li.

Read more ...


Large language model inference optimizations on AMD GPUs

15, Mar 2024 by Seungrok Jung.

Read more ...


Music Generation With MusicGen on an AMD GPU

8, Mar 2024 by Phillip Dang.

Read more ...


Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

23 Feb, 2024 by Douglas Jia.

Read more ...


Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

7, Feb 2024 by Vara Lakshmi Bayanagari.

Read more ...


Using LoRA for efficient fine-tuning: Fundamental principles

5, Feb 2024 by Sean Song.

Read more ...


Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

1, Feb 2024 by Sean Song.

Read more ...


Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

29, Jan 2024 by Vara Lakshmi Bayanagari.

Read more ...


Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

26, Jan 2024 by Vara Lakshmi Bayanagari.

Read more ...


LLM distributed supervised fine-tuning with JAX

25 Jan, 2024 by Douglas Jia.

Read more ...


Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

24 Jan, 2024 by Douglas Jia.

Read more ...


Efficient deployment of large language models with Text Generation Inference on AMD GPUs

24 Jan, 2024 by Douglas Jia.

Read more ...