Posts by Eugene Katsov

Efficient GPU Utilization With Workload Pre-Emption in AMD Resource Manager

GPU capacity is sought after and in high demand. Production inference services, fine-tuning jobs, and developer workspaces like VS Code or JupyterLab all compete for the same resources. The challenge is not just about provisioning enough GPUs, it is keeping them utilized and making sure prioritized work can access capacity when it needs it. Training jobs can drop to near-zero utilization between compute phases; inference services can go quiet between traffic bursts; R&D or experimentation models and development workspaces might be left running unutilized or after hours. This would mean that workloads hold on to GPUs they are no longer using, while other work sits queued.

Read more ...