AI Blogs - Page 7#

Supercharging JAX with Triton Kernels on AMD GPUs
In this blog post we guide you through developing a fused dropout activation kernel for matrices in Triton, calling the kernel from JAX, and benchmarking its performance.

Leaner LLM Inference with INT8 Quantization on AMD GPUs using PyTorch
This blog demonstrates how to use AMD GPUs to implement and evaluate INT8 quantization, and the derived inference speed-up of Llama family and Mistral LLM models.

Fine-tuning Llama 3 with Axolotl using ROCm on AMD GPUs
This blog demonstrates how to fine-tune Llama 3 with Axolotl using ROCm on AMD GPUs, and how to evaluate the performance of your LLM before and after fine-tuning.

Inferencing and serving with vLLM on AMD GPUs
Learn step-by-step how to leverage vLLM for high-performance inferencing and model serving on AMD GPUs

Enhancing vLLM Inference on AMD GPUs
Showcases the latest performance enhancements in vLLM inference on AMD Instinct accelerators using ROCm 6.2, including FP8 KV-Cache, quantization, and GEMM tuning

Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs
Guide to modify our JAX-based nanoGPT model for mixed-precision training, optimizing speed and efficiency on AMD GPUs with ROCm.

Image Classification with BEiT, MobileNet, and EfficientNet using ROCm on AMD GPUs
Image Classification with BEiT, MobileNet, and EfficientNet on AMD GPU

Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission
Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs
Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

Using AMD GPUs for Enhanced Time Series Forecasting with Transformers
Time series forecasting (TSF) predicts future behavior using past data. This guide focuses on implementing Transformers for TSF, covering preprocessing to evaluation using AMD hardware.

Inferencing with Grok-1 on AMD GPUs
We demonstrate that the massive Grok-1 Model from xAI can run seamlessly on the AMD MI300X GPU accelerator by leveraging the ROCm software platform.

Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD
In this blog we explore how to fine-tune the Robustly Optimized BERT Pretraining Approach RoBERTa large language model, with emphasis on PyTorch's mixed precision capabilities. Specifically, we explore using AMD GPUs for mixed precision fine-tuning to achieve faster model training without any major impacts on accuracy.