With the increased growth in dataset sizes, the improvement of image capturing technology, the capacity
to extract more information from visual data, and the move towards large language models including image
data as input, efficient image processing and preparation has become a necessity to run these workloads
in a timely manner. Although much attention is often focused on the computational aspects of these
workloads, the fundamental tasks of data loading and preparation have become significant bottlenecks,
limiting the throughput of the entire pipeline. Accelerated JPEG decoding is an essential step in
optimizing workloads that rely on image data. Dive into this blog post to learn how to install and benchmark
rocJPEG, as well as how the ROCm™ platform and AMD Instinct GPUs can help you achieve up to 50x faster decoding
performance in 4k1.
In the rapidly evolving landscape of high-performance computing and artificial intelligence, innovation is the currency of progress. AMD’s ROCm 6.4 isn’t just another software update—it’s a leap forward that redefines the boundaries of what is possible for AI, developers, researchers, and enterprise innovators.
State Space Models (SSMs), such as Mamba, have emerged as a potential alternative to Transformer models. Vision backbones using only SSMs have yielded promising results. For more information about SSMs and Mamba’s performance on AMD hardware, see Mamba on AMD GPUs with ROCm.
This blog explores Vision Mamba (Vim), an innovative and efficient backbone for vision tasks and evaluate its performance on AMD GPUs with ROCm. We’ll start with a brief introduction to Vision Mamba, followed by a step-by-step guide on training and running inference with Vision Mamba on AMD GPUs using ROCm.
Image classification is a key task in computer vision aiming at “understanding” an entire image. The outcome of an image classifier is a label or a category for the image as a whole, unlike object recognition where the task is to detect and classify multiple objects within an image.
PyTorch 2.0 introduces torch.compile(), a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.