Posts tagged LLM

Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm

PyTorch 2.0 introduces torch.compile(), a tool to vastly accelerate PyTorch code and models. By converting PyTorch code into highly optimized kernels, torch.compile delivers substantial performance improvements with minimal changes to the existing codebase. This feature allows for precise optimization of individual functions, entire modules, and complex training loops, providing a versatile and powerful tool for enhancing computational efficiency.

Read more ...


Accelerating models on ROCm using PyTorch TunableOp

In this blog, we will show how to leverage PyTorch TunableOp to accelerate models using ROCm on AMD GPUs. We will discuss the basics of General Matrix Multiplications (GEMMs), show an example of tuning a single GEMM, and finally, demonstrate real-world performance gains on an LLM (gemma) using TunableOp.

Read more ...


A Guide to Implementing and Training Generative Pre-trained Transformers (GPT) in JAX on AMD GPUs

2 July, 2024 by Douglas Jia.

Read more ...


Mamba on AMD GPUs with ROCm

28, Jun 2024 by Sean Song, Jassani Adeem, Moskvichev Arseny.

Read more ...


SmoothQuant model inference on AMD Instinct MI300X using Composable Kernel

The AMD ROCm™ Composable Kernel (CK) library provides a programming model for writing performance-critical kernels for machine learning workloads. It generates a general-purpose kernel during the compilation phase through a C++ template, enabling developers to achieve operation fusions on different data precisions.

Read more ...


Accelerating Large Language Models with Flash Attention on AMD GPUs

15, May 2024 by Clint Greene.

Read more ...


Step-by-Step Guide to Use OpenLLM on AMD GPUs

1, May 2024 by Fabricio Flores.

Read more ...


Inferencing with Mixtral 8x22B on AMD GPUs

1, May 2024 by Clint Greene.

Read more ...


Table Question-Answering with TaPas

26 Apr, 2024 by Phillip Dang.

Read more ...


Multimodal (Visual and Language) understanding with LLaVA-NeXT

26, Apr 2024 by Phillip Dang.

Read more ...


Unlocking Vision-Text Dual-Encoding: Multi-GPU Training of a CLIP-Like Model

24 Apr, 2024 by Sean Song.

Read more ...


Inferencing with AI2’s OLMo model on AMD GPU

17 Apr, 2024 by Douglas Jia.

Read more ...


Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llama 2 on a single AMD GPU

15, Apr 2024 by Sean Song.

Read more ...


Using the ChatGLM-6B bilingual language model with AMD GPUs

4, Apr 2024 by Phillip Dang.

Read more ...


Retrieval Augmented Generation (RAG) using LlamaIndex

4, Apr 2024 by Clint Greene.

Read more ...


Inferencing and serving with vLLM on AMD GPUs

4 Apr, 2024 by Clint Greene.

Read more ...


Building semantic search with SentenceTransformers on AMD

4 Apr, 2024 by Fabricio Flores.

Read more ...


Scale AI applications with Ray

1, Apr 2024 by Vicky Tsang<vicktsan>, {hoverxref}Logan Grado, {hoverxref}Eliot Li.

Read more ...


Large language model inference optimizations on AMD GPUs

15, Mar 2024 by Seungrok Jung.

Read more ...


Using LoRA for efficient fine-tuning: Fundamental principles

5, Feb 2024 by Sean Song.

Read more ...


Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

1, Feb 2024 by Sean Song.

Read more ...


Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU

29, Jan 2024 by Vara Lakshmi Bayanagari.

Read more ...


Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

26, Jan 2024 by Vara Lakshmi Bayanagari.

Read more ...


Accelerating XGBoost with Dask using multiple AMD GPUs

26 Jan, 2024 by Clint Greene.

Read more ...


LLM distributed supervised fine-tuning with JAX

25 Jan, 2024 by Douglas Jia.

Read more ...


Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

24 Jan, 2024 by Douglas Jia.

Read more ...


Efficient deployment of large language models with Text Generation Inference on AMD GPUs

24 Jan, 2024 by Douglas Jia.

Read more ...