Posts tagged Llama

Enhancing LLM Accessibility: A Deep Dive into QLoRA Through Fine-tuning Llma 2 on a single AMD GPU

Building on the previous blog Fine-tune Llama 2 with LoRA blog, we delve into another Parameter Efficient Fine-Tuning (PEFT) approach known as Quantized Low Rank Adaptation (QLoRA). The focus will be on leveraging QLoRA for the fine-tuning of Llama-2 7B model using a single AMD GPU with ROCm. This task, made possible through the use of QLoRA, addresses challenges related to memory and computing limitations. The exploration aims to showcase how QLoRA can be employed to enhance accessibility to open-source large language models.

Read more ...

Large language model inference optimizations on AMD GPUs

Large language models (LLMs) have transformed natural language processing and comprehension, facilitating a multitude of AI applications in diverse fields. LLMs have various promising use cases, including AI assistants, chatbots, programming, gaming, learning, searching, and recommendation systems. These applications leverage the capabilities of LLMs to provide personalized and interactive experiences, which enhances user engagement.

Read more ...

Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering

In this blog, we show you how to fine-tune Llama 2 on an AMD GPU with ROCm. We use Low-Rank Adaptation of Large Language Models (LoRA) to overcome memory and computing limitations and make open-source large language models (LLMs) more accessible. We also show you how to fine-tune and upload models to Hugging Face.

Read more ...