Posts by Vara Lakshmi Bayanagari

Transformer based Encoder-Decoder models for image-captioning on AMD GPUs

Image captioning, or the GenAI-based automatic generation of concise textual descriptions of images, has immensely important real-world applications. For example, image captioning can provide visually impaired users with textual descriptions of images for improved accessibility, image captioning can add textual descriptions to products in e-commerce applications and help children map images to their textual descriptions in early childhood educational apps. Image captioning can automatically describe objects and events in security camera footage in surveillance applications and can enable robots to auto-generate textual captions for objects and events they encountered in human-robot interaction (HRI) applications, and many more applications. Image captioning is a sequence-to-sequence (seq2seq) machine learning task: a model converting a sequence from one domain (in this case, the image), to another (its textual description). In image captioning the image is partitioned into a sequence of patches. This sequence of image patches is then converted by the model to a corresponding sequence of text tokens.

Read more ...


Quantized 8-bit LLM training and inference using bitsandbytes on AMD GPUs

In this blog post we will cover the bitsandbytes 8-bit representations. As you will see, the bitsandbytes 8-bit representations significantly help reduce the memory needed for fine-tuning and inferencing LLMs. There are many quantization techniques used in the field to decrease a model size, but bitsandbytes offers quantization to decrease the size of optimizer states as well. This post will help you understand the basic principles underlying the bitsandbytes 8-bit representations, explain the bitsandbytes 8-bit optimizer and LLM.int8 techniques, and show you how to implement these on AMD GPUs using ROCm.

Read more ...


Speed Up Text Generation with Speculative Sampling on AMD GPUs

As the size of transformer models grow, so does the cost of conducting inference, impacting latency and throughput. Compression methods such as quantization and distillation, as well as hardware-aware optimizations such as Flash Attention and Triton, have been proposed to cut down the computation cost at different levels. However, these models either compromise on accuracy or require major changes to the model implementation.

Read more ...


Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs

Image classification is a key task in computer vision aiming at “understanding” an entire image. The outcome of an image classifier is a label or a category for the image as a whole, unlike object recognition where the task is to detect and classify multiple objects within an image.

Read more ...


Panoptic segmentation and instance segmentation with Detectron2 on AMD GPUs

23, May 2024 by Vara Lakshmi Bayanagari.

Read more ...


Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU

30, Apr 2024 by Vara Lakshmi Bayanagari.

Read more ...


PyTorch C++ Extension on AMD GPU

16, Apr 2024 by Vara Lakshmi Bayanagari.

Read more ...


Total body segmentation using MONAI Deploy on an AMD GPU

4, Apr 2024 by Vara Lakshmi Bayanagari.

Read more ...


Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

7, Feb 2024 by Vara Lakshmi Bayanagari.

Read more ...


Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

29, Jan 2024 by Vara Lakshmi Bayanagari.

Read more ...


Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

26, Jan 2024 by Vara Lakshmi Bayanagari.

Read more ...