Posts by Sathish Sanjeevi

Technical Dive into AMD MLPerf Training v5.1 Submission

MLPerf Training v5.1 was released on November 12th 2025 and for this round, AMD has showcased its newest GPUs and added a new benchmark. The highlights of this round include:

Read more ...


Reproducing AMD MLPerf Training v5.1 Submission Result

Building upon the success of the MLPerf Training v5.0 submission, AMD has not only submitted improved results for the Llama 2 70B LoRA finetuning benchmark for the MI300X and MI325X platforms in the v5.1 round, but also for the MI350X and MI355X platforms. In addition, AMD has submissions for the newly added Llama 3.1 8B pretraining benchmark in this MLPerf Training round. The AMD submissions are summarized in the following table:

Read more ...


Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs

In recent years, large language models (LLMs) have transformed the landscape of natural language processing, enabling breakthroughs in tasks ranging from code generation to answering complex questions. Among these, the Llama 2 model family developed by Meta has emerged as a powerful and versatile set of open weight transformer-based models, known for their competitive performance across diverse NLP benchmarks. With model sizes ranging from 7 billion to 70 billion parameters, Llama 2 has quickly become a popular choice for both research and industry after its release in 2023, striking a balance between scalability and efficiency.

Read more ...


AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs

MLPerf Training is one of the most influential benchmarks in the AI community, playing a critical role in measuring and advancing the performance of machine learning training across diverse hardware and software platforms. Established to provide a fair, standardized way to evaluate training speed and efficiency on real-world workloads, MLPerf Training has become the chosen standard for researchers, engineers, and organizations striving to test the boundaries of AI capability. By fostering transparency and innovation, it focuses on progression in both academic research and industry applications, helping the community identify the most effective technologies to power the next generation of intelligent systems.

Read more ...


High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide

This blog showcases an implementation of the BERT-L model on the AMD Instinct™ GPUs using ROCm with advanced optimization including but not limited to mixed precision training, packed datasets, Flash Attention and MLPerf-compliant techniques. BERT (Bidirectional Encoder Representations from Transformers) is a language representation model developed by researchers at Google in 2018. It is based on the Transformer architecture and processes text bidirectionally, which contrasts with traditional models that read text sequentially.

Read more ...