Posts by Eliot Li

Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

04 June 2025

In recent years, large language models (LLMs) have transformed the landscape of natural language processing, enabling breakthroughs in tasks ranging from code generation to answering complex questions. Among these, the Llama 2 model family developed by Meta has emerged as a powerful and versatile set of open weight transformer-based models, known for their competitive performance across diverse NLP benchmarks. With model sizes ranging from 7 billion to 70 billion parameters, Llama 2 has quickly become a popular choice for both research and industry after its release in 2023, striking a balance between scalability and efficiency.

Read more ...

AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

04 June 2025

MLPerf Training is one of the most influential benchmarks in the AI community, playing a critical role in measuring and advancing the performance of machine learning training across diverse hardware and software platforms. Established to provide a fair, standardized way to evaluate training speed and efficiency on real-world workloads, MLPerf Training has become the chosen standard for researchers, engineers, and organizations striving to test the boundaries of AI capability. By fostering transparency and innovation, it focuses on progression in both academic research and industry applications, helping the community identify the most effective technologies to power the next generation of intelligent systems.

Read more ...

High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

03 June 2025

This blog showcases an implementation of the BERT-L model on the AMD Instinct™ GPUs using ROCm with advanced optimization including but not limited to mixed precision training, packed datasets, Flash Attention and MLPerf-compliant techniques. BERT (Bidirectional Encoder Representations from Transformers) is a language representation model developed by researchers at Google in 2018. It is based on the Transformer architecture and processes text bidirectionally, which contrasts with traditional models that read text sequentially.

Read more ...

Scale LLM Inference with Multi-Node Infrastructure

30 May 2025

Horizontal scaling of compute resources has become a critical aspect of modern computing due to the ever-increasing growth in data and computational demands. Unlike vertical scaling, which focuses on enhancing an individual system’s resources, horizontal scaling enables the expansion of a system’s capabilities by adding more instances or nodes working in parallel. In this way, it ensures high availability and low latency of the service, making it essential to handle diverse workloads and ensure optimal user experience.

Read more ...

Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.0 Submission

02 April 2025

Building upon the success of our MLPerf Inference v4.1 submission, AMD has submitted results for two popular models – Llama 2 70B and Stable Diffusion XL (SDXL) – in the MLPerf Inference v5.0 round. This blog post provides a comprehensive, step-by-step guide on reproducing the results of AMD’s MLPerf submission using ROCm and the AMD Instinct™ MI325X GPUs. Please follow along to independently verify these results and gain hands-on experience with the benchmarking process. If you are interested in learning more about the advanced optimization strategies behind our Llama 2 70B and SDXL inference, from quantization and General Matrix Multiplication (GEMM) tuning to cutting-edge vLLM scheduling and platform enhancements, check out our blog on MLPerf Inference v5.0 optimization strategies.

Read more ...

AMD Instinct™ MI325X GPUs Produce Strong Performance in MLPerf Inference v5.0

02 April 2025

AI transformation and its ever-increasing demands of GenAI, LLMs, reasoning models and new advances in inference and training emphasize the need for innovative GPU architectures and products designed and delivered at an accelerated pace. Understanding the performance of AI models on these GPUs is critical for continuous advances in AI deployments and adoption. However, benchmarking AI models is challenging due to their inherent complexity and variety of possible deployments and tasks. Approaching this problem from a cross-industry perspective is preferable to have a benchmark that is comparable across different platforms and vendors. MLPerf is such a benchmark created by a cross-industry MLCommons consortium of which AMD is a founding member.

Read more ...

Triton Inference Server with vLLM on AMD GPUs

08 January 2025

Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained AI models from various machine learning and deep learning frameworks including Tensorflow, PyTorch, and vLLM, making it adaptable for diverse AI workloads. It is designed to work across multiple environments, including cloud, data centers and edge devices.

Read more ...

Benchmarking Machine Learning using ROCm and AMD GPUs: Reproducing Our MLPerf Inference Submission

28 August 2024

Measuring the performance of new technologies is as old as human history, and often as intriguing (consider for example that we still compare the performance of new electric vehicle motors using horsepower). In the rapidly advancing field of machine learning (ML) MLPerf was established by MLCommons on May 2nd 2018 and quickly became the golden standard of measuring the accuracy, speed, and efficiency of AI. MLPerf provides benchmarks on training, HPC and Inference performance. Companies across the industry use MLPerf submissions to evaluate the performance of various GPUs and software platforms, and make their technology adoption decisions based on these results.

Read more ...

Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs

21 August 2024

In this blog you will learn how to use ROCm, running on AMD’s Instinct GPUs, for a range of popular and useful natural language processing (NLP) tasks, using different large language models (LLMs). The blog includes a simple to follow hands-on guide that shows you how to implement LLMs for core NLP applications ranging from text generation and sentiment analysis to extractive question answering (QA), and solving a math problem.

Read more ...

Inferencing with Grok-1 on AMD GPUs

09 August 2024

We demonstrate that the massive Grok-1 model from xAI can run seamlessly on the AMD MI300X GPU accelerator by leveraging the ROCm software platform.

Read more ...

Image classification using Vision Transformer with AMD GPUs

04 April 2024

4 Apr, 2024 by

.

Read more ...

Scale AI applications with Ray

01 April 2024

1, Apr 2024 by

Logan Grado, {hoverxref}Eliot Li.

Read more ...