Fabricio Flores

Fabricio Flores#

Fabricio is a Senior Machine Learning Engineer at AMD, known for his expertise in deploying deep learning models for computer vision and large language model (LLM) applications on AMD GPUs. He has a strong background in mathematics, with advanced studies in computational mechanics and machine learning (ML) with sensing devices. His main areas of expertise include computer vision, generative artificial intelligence (GenAI), large language models (LLMs), and numerical mathematics. He’s also deeply interested in high-performance computing (HPC), particularly its applications to quantitative finance and cybersecurity.

Posts by Fabricio Flores

May 22, 2026

From Build to Benchmark: ONNX Model Serving with Triton Inference Server on AMD GPUs

Step-by-step guide to building, deploying, and benchmarking ONNX models with Triton Inference Server and MIGraphX on AMD GPUs

https://rocm.blogs.amd.com/software-tools-optimization/triton-server-onnx/README.html

November 13, 2025

Accelerating Vector Search: hipVS and hipRAFT on AMD

Learn how hipVS accelerates vector search on AMD Instinct GPUs, with notebook demos for semantic search, RAG, and recommendation systems.

https://rocm.blogs.amd.com/software-tools-optimization/hipvs/README.html

October 02, 2025

From Ingestion to Inference: RAG Pipelines on AMD GPUs

Build a RAG enhanced GenAI application that improves the quality of model responses by incorporating data that is missing in the model training data.

https://rocm.blogs.amd.com/artificial-intelligence/rag-agent/README.html

June 20, 2025

Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs

Learn how to leverage Model Context Protocol (MCP) servers to provide real time context information to LLMs through a chatbot example on AMD GPUs

https://rocm.blogs.amd.com/artificial-intelligence/mcp-model-context-protocol/README.html

May 07, 2025

DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs

This blog post demonstrates how hipDF significantly enhances and accelerates data manipulation, aggregation, and transformation tasks on AMD hardware using ROCm.

https://rocm.blogs.amd.com/artificial-intelligence/hipDF_pandas_accelerated/README.html

May 06, 2025

CuPy and hipDF on AMD: The Basics and Beyond

Learn how to deploy CuPy and hipDF on AMD GPUs. See their high-performance computing advantages, and use CuPy and hipDF in a detailed example of an investment portfolio allocation optimization using the Markowitz model.

https://rocm.blogs.amd.com/artificial-intelligence/cupy_hipdf_portfolio_opt/README.html

April 09, 2025

Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel

Learn how to compress LLMs with GPTQModel and run them efficiently on AMD GPUs using INT4 quantization, reducing memory use, shrinking model size, and enabling fast inference

https://rocm.blogs.amd.com/artificial-intelligence/gptq/README.html

March 23, 2025

Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs

Learn how to use MegaBlocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm

https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html

February 19, 2025

Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training

Fine-tuning Phi-3.5-mini-instruct LLM using multinode distributed training with Hugging Face Accelerate, Slurm, and Docker for scalable efficiency.

https://rocm.blogs.amd.com/artificial-intelligence/multinode_accelerate_phi35/README.html

January 08, 2025

Triton Inference Server with vLLM on AMD GPUs

This blog provides a how-to guide on setting up a Triton Inference Server with vLLM backend powered by AMD GPUs, showcasing robust performance with several LLMs

https://rocm.blogs.amd.com/artificial-intelligence/triton_server_vllm/README.html

October 24, 2024

Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power

Torchtune is a PyTorch library that enables efficient fine-tuning of LLMs. In this blog we use Torchtune to fine-tune the Llama-3.1-8B model for summarization tasks using LoRA and showcasing scalable training across multiple GPUs.

https://rocm.blogs.amd.com/artificial-intelligence/torchtune/README.html

August 19, 2024

Using AMD GPUs for Enhanced Time Series Forecasting with Transformers

Time series forecasting (TSF) predicts future behavior using past data. This guide focuses on implementing Transformers for TSF, covering preprocessing to evaluation using AMD hardware.

https://rocm.blogs.amd.com/artificial-intelligence/timeseries_transformers/README.html

July 29, 2024

Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD

In this blog we explore how to fine-tune the Robustly Optimized BERT Pretraining Approach RoBERTa large language model, with emphasis on PyTorch's mixed precision capabilities. Specifically, we explore using AMD GPUs for mixed precision fine-tuning to achieve faster model training without any major impacts on accuracy.

https://rocm.blogs.amd.com/artificial-intelligence/roberta_amp/README.html

June 27, 2024

Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs

This blog post demonstrates how to fine-tune and test three state-of-the-art machine learning Automatic Speech Recognition (ASR) models, running on AMD GPUs using ROCm.

https://rocm.blogs.amd.com/artificial-intelligence/speech_models/README.html

June 18, 2024

TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs

TensorFlow Profiler measures resource use and performance of models, helping identify bottlenecks for optimization. This blog demonstrates the use of the TensorFlow Profiler tool on AMD hardware.

https://rocm.blogs.amd.com/software-tools-optimization/tf_profiler/README.html

May 07, 2024

AMD in Action: Unveiling the Power of Application Tracing and Profiling

https://rocm.blogs.amd.com/software-tools-optimization/roc-profiling/README.html

May 01, 2024

Step-by-Step Guide to Use OpenLLM on AMD GPUs

OpenLLM is an open-source platform for deploying large language models, enabling cloud or on-premises use. In this blog we focus on using OpenLLM to start an LLM server leveraging the capabilities of AMD GPUs

https://rocm.blogs.amd.com/artificial-intelligence/openllm/README.html

April 04, 2024

Building semantic search with SentenceTransformers on AMD

https://rocm.blogs.amd.com/artificial-intelligence/sentence_transformers_amd/README.html