Posts by Sean Song

Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

PyTorch 2.0 introduces torch.compile(), a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.

Read more ...


Mamba on AMD GPUs with ROCm

28, Jun 2024 by Sean Song, Jassani Adeem, Moskvichev Arseny.

Read more ...


Segment Anything with AMD GPUs

4 Jun, 2024 by Sean Song.

Read more ...


Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 Apr, 2024 by Sean Song.

Read more ...


Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU

16, Apr 2024 by Sean Song.

Read more ...


Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

15, Apr 2024 by Sean Song.

Read more ...


Using LoRA for efficient fine-tuning: Fundamental principles

5, Feb 2024 by Sean Song.

Read more ...


Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

1, Feb 2024 by Sean Song.

Read more ...