Posts tagged Computer Vision

Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

Contrastive Language-Image Pre-Training (CLIP) is a multimodal deep learning model that bridges vision and natural language. It was introduced in the paper “Learning Transferrable Visual Models from Natural Language Supervision” (2021) from OpenAI, and it was trained contrastively on a huge amount (400 million) of web scraped data of image-caption pairs (one of the first models to do this).

Read more ...


ResNet for image classification using AMD GPUs

In this blog, we demonstrate training a simple ResNet model for image classification on AMD GPUs using ROCm on the CIFAR10 dataset. Training a ResNet model on AMD GPUs is simple, requiring no additional work beyond installing ROCm and appropriate PyTorch libraries.

Read more ...


Total body segmentation using MONAI Deploy on an AMD GPU

Medical Open Network for Artificial Intelligence (MONAI) is an open-source organization that provides PyTorch implementation of state-of-the-art medical imaging models, ranging from classification and segmentation to image generation. Catering to the needs of researchers, clinicians, and fellow domain contributors, MONAI’s lifecycle provides three different end-to-end workflow tools: MONAI Core, MONAI Label, and MONAI Deploy.

Read more ...


Image classification using Vision Transformer with AMD GPUs

The Vision Transformer (ViT) model was first proposed in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ViT is an attractive alternative to conventional Convolutional Neural Network (CNN) models due to its excellent scalability and adaptability in the field of computer vision. On the other hand, ViT can be more expensive compared to CNN for large input images as it has quadratic computation complexity with respect to input size.

Read more ...