Posts by Adeem Jassani
A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs
- 18 December 2025
LLMs have shown great capability to generalize new tasks and are at the core of many AI applications recently. Performance of these models has scaled with model size, resulting in training of larger models on massive datasets. This increase in computation requirements has led to training of these LLMs on a huge number of GPUs and poses significant engineering and infrastructure challenges in ensuring standard backpropagation training. They are trained on a strongly interconnected network of devices and hence limit training of the models to a single cluster / data center.