Fabricio Flores#
Fabricio is a Senior Machine Learning Engineer at AMD, known for his expertise in deploying deep learning models for computer vision and large language model (LLM) applications on AMD GPUs. He has a strong background in mathematics, with advanced studies in computational mechanics and machine learning (ML) with sensing devices. His main areas of expertise include computer vision, generative artificial intelligence (GenAI), large language models (LLMs), and numerical mathematics. He’s also deeply interested in high-performance computing (HPC), particularly its applications to quantitative finance and cybersecurity.
Posts by Fabricio Flores
Accelerating Vector Search: hipVS and hipRAFT on AMD
Learn how hipVS accelerates vector search on AMD Instinct GPUs, with notebook demos for semantic search, RAG, and recommendation systems.
From Ingestion to Inference: RAG Pipelines on AMD GPUs
Build a RAG enhanced GenAI application that improves the quality of model responses by incorporating data that is missing in the model training data.
Enabling Real-Time Context for LLMs: Model Context Protocol (MCP) on AMD GPUs
Learn how to leverage Model Context Protocol (MCP) servers to provide real time context information to LLMs through a chatbot example on AMD GPUs
DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs
This blog post demonstrates how hipDF significantly enhances and accelerates data manipulation, aggregation, and transformation tasks on AMD hardware using ROCm.
CuPy and hipDF on AMD: The Basics and Beyond
Learn how to deploy CuPy and hipDF on AMD GPUs. See their high-performance computing advantages, and use CuPy and hipDF in a detailed example of an investment portfolio allocation optimization using the Markowitz model.
Shrink LLMs, Boost Inference: INT4 Quantization on AMD GPUs with GPTQModel
Learn how to compress LLMs with GPTQModel and run them efficiently on AMD GPUs using INT4 quantization, reducing memory use, shrinking model size, and enabling fast inference
Efficient MoE training on AMD ROCm: How-to use MegaBlocks on AMD GPUs
Learn how to use MegaBlocks to pre-train GPT2 Mixture of Experts (MoE) model, helping you scale your deep learning models effectiveness on AMD GPUs using ROCm
Fine-tuning Phi-3.5-mini LLM at scale: Harnessing Accelerate and Slurm for multinode training
Fine-tuning Phi-3.5-mini-instruct LLM using multinode distributed training with Hugging Face Accelerate, Slurm, and Docker for scalable efficiency.
Triton Inference Server with vLLM on AMD GPUs
This blog provides a how-to guide on setting up a Triton Inference Server with vLLM backend powered by AMD GPUs, showcasing robust performance with several LLMs
Torchtune on AMD GPUs How-To Guide: Fine-tuning and Scaling LLMs with Multi-GPU Power
Torchtune is a PyTorch library that enables efficient fine-tuning of LLMs. In this blog we use Torchtune to fine-tune the Llama-3.1-8B model for summarization tasks using LoRA and showcasing scalable training across multiple GPUs.
Using AMD GPUs for Enhanced Time Series Forecasting with Transformers
Time series forecasting (TSF) predicts future behavior using past data. This guide focuses on implementing Transformers for TSF, covering preprocessing to evaluation using AMD hardware.
Optimizing RoBERTa: Fine-Tuning with Mixed Precision on AMD
In this blog we explore how to fine-tune the Robustly Optimized BERT Pretraining Approach RoBERTa large language model, with emphasis on PyTorch's mixed precision capabilities. Specifically, we explore using AMD GPUs for mixed precision fine-tuning to achieve faster model training without any major impacts on accuracy.
Fine-tuning and Testing Cutting-Edge Speech Models using ROCm on AMD GPUs
This blog post demonstrates how to fine-tune and test three state-of-the-art machine learning Automatic Speech Recognition (ASR) models, running on AMD GPUs using ROCm.
TensorFlow Profiler in practice: Optimizing TensorFlow models on AMD GPUs
TensorFlow Profiler measures resource use and performance of models, helping identify bottlenecks for optimization. This blog demonstrates the use of the TensorFlow Profiler tool on AMD hardware.
AMD in Action: Unveiling the Power of Application Tracing and Profiling
AMD in Action: Unveiling the Power of Application Tracing and Profiling
Step-by-Step Guide to Use OpenLLM on AMD GPUs
OpenLLM is an open-source platform for deploying large language models, enabling cloud or on-premises use. In this blog we focus on using OpenLLM to start an LLM server leveraging the capabilities of AMD GPUs
Building semantic search with SentenceTransformers on AMD
Building semantic search with SentenceTransformers on AMD