Posts tagged PyTorch

Accelerating Large Language Models with Flash Attention on AMD GPUs

In this blog post, we will guide you through the process of installing Flash Attention on AMD GPUs and provide benchmarks comparing its performance to standard SDPA in PyTorch. We will also measure end-to-end prefill latency for multiple Large Language Models (LLMs) in Hugging Face.

Read more ...


Table Question-Answering with TaPas

Conventionally, the question-answering task is framed as a semantic parsing task where the question is translated to a full logical form that can be executed against the table to retrieve the correct answer. However, this requires a lot of annotated data, which can be expensive to acquire.

Read more ...


Multimodal (Visual and Language) understanding with LLaVA-NeXT

LLaVa (Large Language And Vision Assistant) was introduced in 2023 and became a milestone for multimodal models. It combines a pretrained vision encoder and a pretrained LLM for general purpose visual and language understanding. In January 2024, LLaVa-NeXT was released, which boasts significant enhancements, including higher input’s visual resolution and improved logical reasoning and world knowledge.

Read more ...


Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

In this blog, we will build a vision-text dual encoder model akin to CLIP and fine-tune it with the COCO dataset on AMD GPU with ROCm. This work is inspired by the principles of CLIP and the Hugging Face example. The idea is to train a vision encoder and a text encoder jointly to project the representation of images and their descriptions into the same embedding space, such that the text embeddings are located near the embeddings of the images they describe. The objective during training is to maximize the similarity between the embeddings of image and text pairs in the batch while minimizing the similarity of embeddings for incorrect pairs. The model achieves this by learning a multimodal embedding space. A symmetric cross entropy loss is optimized over these similarity scores.

Read more ...


Transforming Words into Motion: A Guide to Video Generation with AMD GPU

This blog introduces the advancements in text-to-video generation through enhancements to the stable diffusion model and demonstrates the process of generating videos from text prompts on an AMD GPU using Alibaba’s ModelScopeT2V model.

Read more ...


Inferencing with AI2’s OLMo model on AMD GPU

In this blog, we will show you how to generate text using AI2’s OLMo model on AMD GPU.

Read more ...


Text Summarization with FLAN-T5

In this blog, we showcase the language model FLAN-T5 and how to fine-tune it on a summarization task with HuggingFace in an AMD GPUs + ROCm system.

Read more ...


PyTorch C++ Extension on AMD GPU

This blog demonstrates how to use the PyTorch C++ extension with an example and discusses its advantages over regular PyTorch modules. The experiments were carried out on AMD GPUs and ROCm 5.7.0 software. For more information about supported GPUs and operating systems, see System Requirements (Linux).

Read more ...


Program Synthesis with CodeGen

CodeGen is a family of standard transformer-based auto-regressive language models for program synthesis, which as defined by the authors as a method for generating computer programs that solve specified problems, using input-output examples or natural language descriptions.

Read more ...


Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

In this blog, we will show you how to fine-tune the StarCoder base model on AMD GPUs with an instruction-answer pair dataset so that it can follow instructions to generate code and answer questions. We will also show you how to use parameter-efficient fine-tuning (PEFT) to minimize the computation cost for the fine-tuning process.

Read more ...


GPU Unleashed: Training Reinforcement Learning Agents with Stable Baselines3 on an AMD GPU in Gymnasium Environment

This blog will delve into the fundamentals of deep reinforcement learning, guiding you through a practical code example that utilizes an AMD GPU to train a Deep Q-Network (DQN) policy within the Gymnasium environment.

Read more ...


ResNet for image classification using AMD GPUs

In this blog, we demonstrate training a simple ResNet model for image classification on AMD GPUs using ROCm on the CIFAR10 dataset. Training a ResNet model on AMD GPUs is simple, requiring no additional work beyond installing ROCm and appropriate PyTorch libraries.

Read more ...


Small language models with Phi-2

Like many other LLMs, Phi-2 is a transformer-based model with a next-word prediction objective that is trained on billions of tokens. At 2.7 billion parameters, Phi-2 is a relatively small language model, but it achieves outstanding performance on a variety of tasks, including common sense reasoning, language understanding, math, and coding. For reference, GPT 3.5 has 175 billion parameters and the smallest version of LLaMA-2 has 7 billion parameters. According to Microsoft, Phi-2 is capable of matching or outperforming models up to 25 times larger due to more carefully curated training data and model scaling.

Read more ...


Using the ChatGLM-6B bilingual language model with AMD GPUs

ChatGLM-6B is an open bilingual (Chinese-English) language model with 6.2 billion parameters. It’s optimized for Chinese conversation based on General Language Model (GLM) architecture. GLM is a pretraining framework that seeks to combine the strengths of autoencoder models (like BERT) and autoregressive models (like GPT). The GLM framework randomly blanks out continuous spans of tokens from the input text (autoencoding methodology) and trains the model to sequentially reconstruct the spans (autoregressive pretraining methodology).

Read more ...


Total body segmentation using MONAI Deploy on an AMD GPU

Medical Open Network for Artificial Intelligence (MONAI) is an open-source organization that provides PyTorch implementation of state-of-the-art medical imaging models, ranging from classification and segmentation to image generation. Catering to the needs of researchers, clinicians, and fellow domain contributors, MONAI’s lifecycle provides three different end-to-end workflow tools: MONAI Core, MONAI Label, and MONAI Deploy.

Read more ...


Automatic mixed precision in PyTorch using AMD GPUs

As models increase in size, the time and memory needed to train them–and consequently, the cost–also increases. Therefore, any measures we take to reduce training time and memory usage can be highly beneficial. This is where Automatic Mixed Precision (AMP) comes in.

Read more ...


Building a decoder transformer model on AMD GPU(s)

In this blog, we demonstrate how to run Andrej Karpathy’s beautiful PyTorch re-implementation of GPT on single and multiple AMD GPUs on a single node using PyTorch 2.0 and ROCm. We use the works of Shakespeare to train our model, then run inference to see if our model can generate Shakespeare-like text.

Read more ...


Question-answering Chatbot with LangChain on an AMD GPU

LangChain is a framework designed to harness the power of language models for building cutting-edge applications. By connecting language models to various contextual sources and providing reasoning abilities based on the given context, LangChain creates context-aware applications that can intelligently reason and respond. In this blog, we demonstrate how to use LangChain and Hugging Face to create a simple question-answering chatbot. We also demonstrate how to augment our large language model (LLM) knowledge with additional information using the Retrieval Augmented Generation (RAG) technique, then allow our bot to respond to queries based on the information contained within specified documents.

Read more ...


Music Generation With MusicGen on an AMD GPU

MusicGen is an autoregressive, transformer-based model that predicts the next segment of a piece of music based on previous segments. This is a similar approach to language models predicting the next token.

Read more ...


Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

In this blog, we show you how to use pre-trained Stable Diffusion models to generate images from text (text-to-image), transform existing visuals (image-to-image), and restore damaged pictures (inpainting) on AMD GPUs using ONNX Runtime.

Read more ...


Simplifying deep learning: A guide to PyTorch Lightning

PyTorch Lightning is a higher-level wrapper built on top of PyTorch. Its purpose is to simplify and abstract the process of training PyTorch models. It provides a structured and organized approach to machine learning (ML) tasks by abstracting away the repetitive boilerplate code, allowing you to focus more on model development and experimentation. PyTorch Lightning works out-of-the-box with AMD GPUs and ROCm.

Read more ...


Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

This tutorial aims to explain the fundamentals of NeRF and its implementation in PyTorch. The code used in this tutorial is inspired by Mason McGough’s colab notebook and is implemented on an AMD GPU.

Read more ...


Using LoRA for efficient fine-tuning: Fundamental principles

Low-Rank Adaptation of Large Language Models (LoRA) is used to address the challenges of fine-tuning large language models (LLMs). Models like GPT and Llama, which boast billions of parameters, are typically cost-prohibitive to fine-tune for specific tasks or domains. LoRA preserves pre-trained model weights and incorporates trainable layers within each model block. This results in a significant reduction in the number of parameters that need to be fine-tuned and considerably reduces GPU memory requirements. The key benefit of LoRA is that it substantially decreases the number of trainable parameters–sometimes by a factor of up to 10,000–leading to a considerable decrease in GPU resource demands.

Read more ...


Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

This blog explains an end-to-end process for pre-training the Bidirectional Encoder Representations from Transformers (BERT) base model from scratch using Hugging Face libraries with a PyTorch backend for English corpus text (WikiText-103-raw-v1).

Read more ...


Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

In this blog, we show you how to pre-train a GPT-3 model using the Megatron-DeepSpeed framework on multiple AMD GPUs. We also demonstrate how to perform inference on the text-generation task with your pre-trained model.

Read more ...


Creating a PyTorch/TensorFlow code environment on AMD GPUs

Note: This blog was previously part of the AMD lab notes blog series.

Read more ...