Posts tagged GenAI

Inferencing with AI2’s OLMo model on AMD GPU

In this blog, we will show you how to generate text using AI2’s OLMo model on AMD GPU.

Read more ...


Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

Contrastive Language-Image Pre-Training (CLIP) is a multimodal deep learning model that bridges vision and natural language. It was introduced in the paper “Learning Transferrable Visual Models from Natural Language Supervision” (2021) from OpenAI, and it was trained contrastively on a huge amount (400 million) of web scraped data of image-caption pairs (one of the first models to do this).

Read more ...


Instruction fine-tuning of StarCoder with PEFT on multiple AMD GPUs

In this blog, we will show you how to fine-tune the StarCoder base model on AMD GPUs with an instruction-answer pair dataset so that it can follow instructions to generate code and answer questions. We will also show you how to use parameter-efficient fine-tuning (PEFT) to minimize the computation cost for the fine-tuning process.

Read more ...


Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llma 2 on a single AMD GPU

Building on the previous blog Fine-tune Llama 2 with LoRA blog, we delve into another Parameter Efficient Fine-Tuning (PEFT) approach known as Quantized Low Rank Adaptation (QLoRA). The focus will be on leveraging QLoRA for the fine-tuning of Llama-2 7B model using a single AMD GPU with ROCm. This task, made possible through the use of QLoRA, addresses challenges related to memory and computing limitations. The exploration aims to showcase how QLoRA can be employed to enhance accessibility to open-source large language models.

Read more ...


Image classification using Vision Transformer with AMD GPUs

The Vision Transformer (ViT) model was first proposed in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ViT is an attractive alternative to conventional Convolutional Neural Network (CNN) models due to its excellent scalability and adaptability in the field of computer vision. On the other hand, ViT can be more expensive compared to CNN for large input images as it has quadratic computation complexity with respect to input size.

Read more ...


Building semantic search with SentenceTransformers on AMD

In this blog, we explain how to train a SentenceTransformers model on the Sentence Compression dataset to perform semantic search. We use the BERT base model (uncased) as the base transformer and apply Hugging Face PyTorch libraries.

Read more ...


Scale AI applications with Ray

Most machine-learning (ML) workloads today require multiple GPUs or nodes to achieve the performance or scale required by applications. However, scaling workloads beyond single node/single GPU workloads is difficult and require some expertise in distributed processing.

Read more ...


Large language model inference optimizations on AMD GPUs

Large language models (LLMs) have transformed natural language processing and comprehension, facilitating a multitude of AI applications in diverse fields. LLMs have various promising use cases, including AI assistants, chatbots, programming, gaming, learning, searching, and recommendation systems. These applications leverage the capabilities of LLMs to provide personalized and interactive experiences, which enhances user engagement.

Read more ...


Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

In this blog, we show you how to use pre-trained Stable Diffusion models to generate images from text (text-to-image), transform existing visuals (image-to-image), and restore damaged pictures (inpainting) on AMD GPUs using ONNX Runtime.

Read more ...


Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

This tutorial aims to explain the fundamentals of NeRF and its implementation in PyTorch. The code used in this tutorial is inspired by Mason McGough’s colab notebook and is implemented on an AMD GPU.

Read more ...


Using LoRA for efficient fine-tuning: Fundamental principles

Low-Rank Adaptation of Large Language Models (LoRA) is used to address the challenges of fine-tuning large language models (LLMs). Models like GPT and Llama, which boast billions of parameters, are typically cost-prohibitive to fine-tune for specific tasks or domains. LoRA preserves pre-trained model weights and incorporates trainable layers within each model block. This results in a significant reduction in the number of parameters that need to be fine-tuned and considerably reduces GPU memory requirements. The key benefit of LoRA is that it substantially decreases the number of trainable parameters–sometimes by a factor of up to 10,000–leading to a considerable decrease in GPU resource demands.

Read more ...


Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

In this blog, we show you how to fine-tune Llama 2 on an AMD GPU with ROCm. We use Low-Rank Adaptation of Large Language Models (LoRA) to overcome memory and computing limitations and make open-source large language models (LLMs) more accessible. We also show you how to fine-tune and upload models to Hugging Face.

Read more ...


Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

This blog explains an end-to-end process for pre-training the Bidirectional Encoder Representations from Transformers (BERT) base model from scratch using Hugging Face libraries with a TensorFlow backend for English corpus text (WikiText-103-raw-v1).

Read more ...


Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

This blog explains an end-to-end process for pre-training the Bidirectional Encoder Representations from Transformers (BERT) base model from scratch using Hugging Face libraries with a PyTorch backend for English corpus text (WikiText-103-raw-v1).

Read more ...


LLM distributed supervised fine-tuning with JAX

In this article, we review the process for fine-tuning a Bidirectional Encoder Representations from Transformers (BERT)-based large language model (LLM) using JAX for a text classification task. We explore techniques for parallelizing this fine-tuning procedure across multiple AMD GPUs, then evaluate our model’s performance on a holdout dataset. For this, we use a (BERT)-base-cased transformer model with a General Language Understanding Evaluation (GLUE) benchmark dataset on multiple AMD GPUs.

Read more ...


Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs

Stable Diffusion has emerged as a groundbreaking advancement in the field of image generation, empowering users to translate text descriptions into captivating visual output.

Read more ...


Efficient deployment of large language models with Text Generation Inference on AMD GPUs

Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs) with unparalleled efficiency. TGI is tailored for popular open-source LLMs, such as Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5. Optimizations include tensor parallelism, token streaming using Server-Sent Events (SSE), continuous batching, and optimized transformers code. It has a robust feature set that includes quantization, safetensors, watermarking (for determining if text is generated from language models), logits warper, and support for custom prompt generation and fine-tuning.

Read more ...