Posts by Sean Song

Distributed Data Parallel training on AMD GPU with ROCm

With the increase in complexity and size of machine learning models, the demand for computational resources grows. Training on a single GPU can become a bottleneck for deep learning applications, especially with large datasets and models that are slow to train on a single GPU. Parallelized training addresses this challenge. Out of the various forms of parallelized training, this blog focuses on Distributed Data Parallel (DDP), a key feature in PyTorch that accelerates training across multiple GPUs and nodes.

Read more ...


Inference with Llama 3.2 Vision LLMs on AMD GPUs Using ROCm

Meta’s Llama models now support multimodal capabilities, expanding their functionality beyond traditional text-only applications. The Llama 3.2 models are available in a range of sizes, including medium-sized 11B and 90B multimodal models for vision-text reasoning tasks, and lightweight 1B and 3B text-only models designed for edge and mobile devices.

Read more ...


Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

PyTorch 2.0 introduces torch.compile(), a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.

Read more ...


Mamba on AMD GPUs with ROCm

28, Jun 2024 by Sean Song, Jassani Adeem, Moskvichev Arseny.

Read more ...


Segment Anything with AMD GPUs

4 Jun, 2024 by Sean Song.

Read more ...


Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 Apr, 2024 by Sean Song.

Read more ...


Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

16, Apr 2024 by Sean Song.

Read more ...


Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

15, Apr 2024 by Sean Song.

Read more ...


Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

1, Feb 2024 by Sean Song.

Read more ...