Recent Posts - Page 2#

Image classification using Vision Transformer with AMD GPUs
Image classification using Vision Transformer with AMD GPUs

Retrieval Augmented Generation (RAG) using LlamaIndex
Retrieval Augmented Generation (RAG) using LlamaIndex

Total body segmentation using MONAI Deploy on an AMD GPU
Total body segmentation using MONAI Deploy with ROCm

Automatic mixed precision in PyTorch using AMD GPUs
In this blog, we will discuss the basics of AMP, how it works, and how it can improve training efficiency on AMD GPUs. As models increase in size, the time and memory needed to train them--and consequently, the cost--also increases. Therefore, any measures we take to reduce training time and memory usage can be highly beneficial. This is where Automatic Mixed Precision (AMP) comes in.

Large language model inference optimizations on AMD GPUs
LLM Inference optimizations on AMD Instinct (TM) GPUs

Building a decoder transformer model on AMD GPU(s)
Building a decoder transformer model

Question-answering Chatbot with LangChain on an AMD GPU
Question-answering Chatbot with LangChain

Music Generation With MusicGen on an AMD GPU
Music Generation With MusicGen on an AMD GPU

Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs
Efficient image generation with Stable Diffusion models and ONNX Runtime using AMD GPUs

Simplifying deep learning: A guide to PyTorch Lightning
Simplifying deep learning: A guide to PyTorch Lightning

Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU
Two-dimensional images to three-dimensional scene mapping using NeRF on an AMD GPU

Using LoRA for efficient fine-tuning: Fundamental principles
This blog demonstrate how to use Lora to efficiently fine-tune Llama model on AMD GPUs with ROCm.

Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering
Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

Fine-tune Llama model with LoRA: Customizing a large language model for question-answering
This blog demonstrate how to use Lora to efficiently fine-tune Llama model on a single AMD GPU with ROCm. model for question-answering

Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU
Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

Pre-training BERT using Hugging Face & PyTorch on an AMD GPU
Pre-training BERT using Hugging Face & PyTorch on an AMD GPU

Accelerating XGBoost with Dask using multiple AMD GPUs
Accelerating XGBoost with Dask using multiple AMD GPUs

LLM distributed supervised fine-tuning with JAX
LLM distributed supervised fine-tuning with JAX

Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs
Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs

Efficient deployment of large language models with Text Generation Inference on AMD GPUs
Efficient deployment of large language models with Hugging Face text generation inference empowered by AMD GPUs

Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs
Efficient image generation with stable diffusion models and AITemplate using AMD GPUs

Sparse matrix vector multiplication - part 1
Sparse matrix vector multiplication - Part 1

Creating a PyTorch/TensorFlow code environment on AMD GPUs
Creating a PyTorch TensorFlow environment on AMD GPUs

Finite difference method - Laplacian part 4
Finite difference method - Laplacian Part 4

Register pressure in AMD CDNA™2 GPUs
Register pressure

Finite difference method - Laplacian part 3
Finite difference method - Laplacian Part 3

Finite difference method - Laplacian part 2
Finite difference method - Laplacian Part 2

AMD matrix cores
Matrix cores

Finite difference method - Laplacian part 1
Finite difference method - Laplacian Part 1