Posts by Sarthak Arora
Reproduce AMD’s MLPerf Training v5.0 Submission Result with Instinct™ GPUs
- 04 June 2025
In recent years, large language models (LLMs) have transformed the landscape of natural language processing, enabling breakthroughs in tasks ranging from code generation to answering complex questions. Among these, the Llama 2 model family developed by Meta has emerged as a powerful and versatile set of open weight transformer-based models, known for their competitive performance across diverse NLP benchmarks. With model sizes ranging from 7 billion to 70 billion parameters, Llama 2 has quickly become a popular choice for both research and industry after its release in 2023, striking a balance between scalability and efficiency.
AMD’s MLPerf Training Debut: Optimizing LLM Fine-Tuning with Instinct™ GPUs
- 04 June 2025
MLPerf Training is one of the most influential benchmarks in the AI community, playing a critical role in measuring and advancing the performance of machine learning training across diverse hardware and software platforms. Established to provide a fair, standardized way to evaluate training speed and efficiency on real-world workloads, MLPerf Training has become the chosen standard for researchers, engineers, and organizations striving to test the boundaries of AI capability. By fostering transparency and innovation, it focuses on progression in both academic research and industry applications, helping the community identify the most effective technologies to power the next generation of intelligent systems.
High-Throughput BERT-L Pre-Training on AMD Instinct™ GPUs: A Practical Guide
- 03 June 2025
This blog showcases an implementation of the BERT-L model on the AMD Instinct™ GPUs using ROCm with advanced optimization including but not limited to mixed precision training, packed datasets, Flash Attention and MLPerf-compliant techniques. BERT (Bidirectional Encoder Representations from Transformers) is a language representation model developed by researchers at Google in 2018. It is based on the Transformer architecture and processes text bidirectionally, which contrasts with traditional models that read text sequentially.