Posts by Clint Greene

Enhancing vLLM Inference on AMD GPUs

11 October, 2024 by Clint Greene.

Read more ...


Supercharging JAX with Triton Kernels on AMD GPUs

Ready to supercharge your deep learning applications on AMD GPUs? In this blog, we’ll show you how to develop a custom fused dropout activation kernel for matrices in Triton, seamlessly call it from JAX, and benchmark its performance with ROCm. This powerful combination will take your model’s performance to the next level.

Read more ...


Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs

Large Language Models (LLMs) have revolutionized the field of natural language processing, enabling machines to understand and generate human-like language. However, these models are often trained on vast amounts of general-purpose data, which can make them less effective for specific tasks or domains. Fine-tuning involves training a pre-trained LLM on a specialized dataset to enhance its performance on specific tasks. As Andrej Karpathy analogized, this process is akin to allowing someone to practice a particular skill. Just as a person might need to practice a skill in a specific context to become proficient, an LLM needs to be fine-tuned on a specific dataset to become proficient in a particular task. For instance, an LLM can be fine-tuned for tasks such as financial forecasting, technical support, legal advising, medical diagnosis, or even instruction following. By fine-tuning an LLM, organizations can achieve better results and improve information security by limiting the exposure of sensitive data.

Read more ...


Inferencing and serving with vLLM on AMD GPUs

19 September, 2024 by Clint Greene.

Read more ...


Accelerating Large Language Models with Flash Attention on AMD GPUs

15, May 2024 by Clint Greene.

Read more ...


Inferencing with Mixtral 8x22B on AMD GPUs

1, May 2024 by Clint Greene.

Read more ...


Speech-to-Text on an AMD GPU with Whisper

16 Apr, 2024 by Clint Greene.

Read more ...


Developing Triton Kernels on AMD GPUs

15 Apr, 2024 by Clint Greene.

Read more ...


Retrieval Augmented Generation (RAG) using LlamaIndex

4, Apr 2024 by Clint Greene.

Read more ...


Accelerating XGBoost with Dask using multiple AMD GPUs

26 Jan, 2024 by Clint Greene.

Read more ...