Posts by Ted Themistokleous
From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs
- 22 May 2026
Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained models from multiple frameworks, including ONNX Runtime, TensorFlow, PyTorch, and others. It runs across cloud, data center, and edge environments, making it adaptable for diverse AI workloads.
Triton Inference Server with vLLM on AMD GPUs
- 08 January 2025
Triton Inference Server is an open-source platform designed to streamline AI inferencing. It supports the deployment, scaling, and inference of trained AI models from various machine learning and deep learning frameworks including Tensorflow, PyTorch, and vLLM, making it adaptable for diverse AI workloads. It is designed to work across multiple environments, including cloud, data centers and edge devices.