Posts by Adeem Jassani
TraceLens: Democratizing AI Performance Analysis
- 27 April 2026
Profiling modern AI workloads produces huge traces that are hard to interpret. Framework profilers record thousands of operations, kernels, and communication events, and engineers often end up staring at tools like Perfetto UI doing manual calculations. TraceLens speeds this up: it consumes existing framework traces and turns them into structured summaries and comparisons, allowing you to move on to the actual diagnosis and optimization.
A Step-by-Step Walkthrough of Decentralized LLM Training on AMD GPUs
- 18 December 2025
LLMs have shown great capability to generalize new tasks and are at the core of many AI applications recently. Performance of these models has scaled with model size, resulting in training of larger models on massive datasets. This increase in computation requirements has led to training of these LLMs on a huge number of GPUs and poses significant engineering and infrastructure challenges in ensuring standard backpropagation training. They are trained on a strongly interconnected network of devices and hence limit training of the models to a single cluster / data center.