<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://rocm.blogs.amd.com/</id>
  <title>AMD ROCm Blogs</title>
  <updated>2026-05-15T17:22:39.595288+00:00</updated>
  <link href="https://rocm.blogs.amd.com/"/>
  <link href="https://rocm.blogs.amd.com/blog/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.io/" version="0.11.12">ABlog</generator>
  <entry>
    <id>https://rocm.blogs.amd.com/artificial-intelligence/semantic-fencing/README.html</id>
    <title>Semantic Fencing of Video Streams Using Embedding Splits from Vision Foundation Models</title>
    <updated>2026-05-15T00:00:00+00:00</updated>
    <author>
      <name>Max Kiehn</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;In this blog, we present a novel approach for &lt;strong&gt;semantically splitting vision datasets&lt;/strong&gt; into training, validation, and test sets. Instead of relying on ad hoc metadata rules or random shuffles, we use embeddings to reason directly about similarity in latent space and construct splits that better reflect true generalization.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/artificial-intelligence/semantic-fencing/README.html"/>
    <summary>In this blog, we present a novel approach for semantically splitting vision datasets into training, validation, and test sets. Instead of relying on ad hoc metadata rules or random shuffles, we use embeddings to reason directly about similarity in latent space and construct splits that better reflect true generalization.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="ComputerVision" label="Computer Vision"/>
    <published>2026-05-15T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/artificial-intelligence/kimi-k2.5-w4a8/README.html</id>
    <title>Further Accelerating Kimi-K2.5 on AMD Instinct™ MI325X: W4A8 &amp; W8A8 Quantization with AMD Quark</title>
    <updated>2026-05-14T00:00:00+00:00</updated>
    <author>
      <name>Eveline Chen</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;In our &lt;a class="reference internal" href="artificial-intelligence/kimi-k2.5-optimize/README.html"&gt;&lt;span class="std std-doc"&gt;previous blog&lt;/span&gt;&lt;/a&gt; &lt;a class="reference internal" href="#references"&gt;[7]&lt;/a&gt;, we demonstrated how to accelerate Kimi-K2.5 &lt;a class="reference internal" href="#references"&gt;[1]&lt;/a&gt; inference on AMD Instinct™ GPUs by profiling the model, identifying &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fused_moe&lt;/span&gt;&lt;/code&gt; as the dominant bottleneck (consuming 88–90% of GPU time), and replacing the default Triton-based kernel with a &lt;a class="reference external" href="https://github.com/ROCm/FlyDSL"&gt;FlyDSL&lt;/a&gt; &lt;a class="reference internal" href="#references"&gt;[2]&lt;/a&gt;-powered mixed-precision (BF16 + W4A16) fused MoE implementation.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/artificial-intelligence/kimi-k2.5-w4a8/README.html"/>
    <summary>In our previous blog [7], we demonstrated how to accelerate Kimi-K2.5 [1] inference on AMD Instinct™ GPUs by profiling the model, identifying fused_moe as the dominant bottleneck (consuming 88–90% of GPU time), and replacing the default Triton-based kernel with a FlyDSL [2]-powered mixed-precision (BF16 + W4A16) fused MoE implementation.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <published>2026-05-14T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/artificial-intelligence/comfyui/README.html</id>
    <title>Accelerating ComfyUI Workflows on AMD Instinct™ MI355X GPUs with ROCm</title>
    <updated>2026-05-11T00:00:00+00:00</updated>
    <author>
      <name>Vish Vadlamani</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;&lt;a class="reference external" href="https://www.comfy.org/"&gt;ComfyUI&lt;/a&gt; is an open-source, graphical node-based interface for building generative AI workflows using diffusion models. With over 100,000 stars on GitHub, it has become one of the most widely adopted tools for text-to-image, text-to-video, and image-to-3D generation. You can build workflows by connecting nodes in a drag-and-drop visual interface (no coding required). A large community contributes custom nodes and workflow templates, making ComfyUI a versatile front-end for models ranging from 12B to 27B parameters. For background and setup across AMD platforms, see the earlier ROCm blogs &lt;a class="reference external" href="https://rocm.blogs.amd.com/software-tools-optimization/comfyui-on-amd/README.html"&gt;Running ComfyUI on AMD Instinct&lt;/a&gt;, &lt;a class="reference external" href="https://rocm.blogs.amd.com/artificial-intelligence/comfyui-radeon-9000/README.html"&gt;Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs&lt;/a&gt;, and &lt;a class="reference external" href="https://rocm.blogs.amd.com/software-tools-optimization/rocm-on-wsl/README.html"&gt;Running ComfyUI in Windows with ROCm on WSL&lt;/a&gt;.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/artificial-intelligence/comfyui/README.html"/>
    <summary>ComfyUI is an open-source, graphical node-based interface for building generative AI workflows using diffusion models. With over 100,000 stars on GitHub, it has become one of the most widely adopted tools for text-to-image, text-to-video, and image-to-3D generation. You can build workflows by connecting nodes in a drag-and-drop visual interface (no coding required). A large community contributes custom nodes and workflow templates, making ComfyUI a versatile front-end for models ranging from 12B to 27B parameters. For background and setup across AMD platforms, see the earlier ROCm blogs Running ComfyUI on AMD Instinct, Getting Started with ComfyUI on AMD Radeon™ RX 9000 Series GPUs, and Running ComfyUI in Windows with ROCm on WSL.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="DiffusionModel" label="Diffusion Model"/>
    <category term="GenAI" label="GenAI"/>
    <published>2026-05-11T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/software-tools-optimization/vllm-atom/README.html</id>
    <title>vLLM-ATOM: Unlocking Native AMD Performance in the vLLM Ecosystem</title>
    <updated>2026-05-07T00:00:00+00:00</updated>
    <author>
      <name>Emad Barsoum</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;This blog walks you through vLLM-ATOM, the AMD-optimized plugin that supercharges vLLM on Instinct GPUs.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/software-tools-optimization/vllm-atom/README.html"/>
    <summary>This blog walks you through vLLM-ATOM, the AMD-optimized plugin that supercharges vLLM on Instinct GPUs.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="Performance" label="Performance"/>
    <published>2026-05-07T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/artificial-intelligence/street-gaussians/README.html</id>
    <title>AMD-Powered 3D Gaussian Splatting for Autonomous Driving Scenes</title>
    <updated>2026-05-07T00:00:00+00:00</updated>
    <author>
      <name>Niko Vuokko</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;&lt;strong&gt;3D Gaussian Splatting (3DGS)&lt;/strong&gt; is an innovative, explicit scene representation and rendering technique. It reconstructs photorealistic 3D environments from a set of images of the scene from a variety of angles. It represents a scene as a vast, learnable collection of &lt;strong&gt;3D Gaussians&lt;/strong&gt;, which are optimized with backpropagation using a &lt;strong&gt;differentiable rasterizer&lt;/strong&gt;. This pipeline enables &lt;strong&gt;real-time novel view synthesis&lt;/strong&gt; of the scene – generating images of the scene from previously unseen angles.
It also permits &lt;strong&gt;easy scene editing&lt;/strong&gt; by moving, copying, recolouring, etc. parts of the reconstructed 3D structure.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/artificial-intelligence/street-gaussians/README.html"/>
    <summary>3D Gaussian Splatting (3DGS) is an innovative, explicit scene representation and rendering technique. It reconstructs photorealistic 3D environments from a set of images of the scene from a variety of angles. It represents a scene as a vast, learnable collection of 3D Gaussians, which are optimized with backpropagation using a differentiable rasterizer. This pipeline enables real-time novel view synthesis of the scene – generating images of the scene from previously unseen angles.
It also permits easy scene editing by moving, copying, recolouring, etc. parts of the reconstructed 3D structure.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <published>2026-05-07T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/artificial-intelligence/farskip-collective-moe/README.html</id>
    <title>Accelerating Mixture-of-Experts Execution with FarSkip-Collective Models</title>
    <updated>2026-05-05T00:00:00+00:00</updated>
    <author>
      <name>Emad Barsoum</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;Whether you are running training or inference, the largest Mixture-of-Experts (MoE) based LLMs cannot fit on a single GPU; instead you must run collective-communication operations to integrate the work of multiple GPUs to work together on a single model.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/artificial-intelligence/farskip-collective-moe/README.html"/>
    <summary>Whether you are running training or inference, the largest Mixture-of-Experts (MoE) based LLMs cannot fit on a single GPU; instead you must run collective-communication operations to integrate the work of multiple GPUs to work together on a single model.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="LLM" label="LLM"/>
    <published>2026-05-05T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/software-tools-optimization/tracelens/README.html</id>
    <title>TraceLens: Democratizing AI Performance Analysis</title>
    <updated>2026-04-27T00:00:00+00:00</updated>
    <author>
      <name>Steven K. Reinhardt</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;Profiling modern AI workloads produces huge traces that are hard to interpret. Framework profilers record thousands of operations, kernels, and communication events, and engineers often end up staring at tools like Perfetto UI doing manual calculations. TraceLens speeds this up: it consumes existing framework traces and turns them into structured summaries and comparisons, allowing you to move on to the actual diagnosis and optimization.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/software-tools-optimization/tracelens/README.html"/>
    <summary>Profiling modern AI workloads produces huge traces that are hard to interpret. Framework profilers record thousands of operations, kernels, and communication events, and engineers often end up staring at tools like Perfetto UI doing manual calculations. TraceLens speeds this up: it consumes existing framework traces and turns them into structured summaries and comparisons, allowing you to move on to the actual diagnosis and optimization.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="Optimization" label="Optimization"/>
    <category term="Performance" label="Performance"/>
    <published>2026-04-27T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/ecosystems-and-partners/eruku-genai/README.html</id>
    <title>Styled Text Image Generation with Eruku on AMD</title>
    <updated>2026-04-24T00:00:00+00:00</updated>
    <author>
      <name>Niko Vuokko</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;Producing text images where text is both readable and controllable while faithfully matching a target visual style is a challenging problem. It has broad applications ranging from synthetic handwritten text generation to graphic design. In these settings, you need more than plausible images; you need precise control over both text content and visual fidelity. This is where &lt;a class="reference external" href="https://aimagelab.github.io/Eruku/"&gt;Eruku&lt;/a&gt;&lt;a class="footnote-reference brackets" href="#eruku" id="id1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; stands out.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/ecosystems-and-partners/eruku-genai/README.html"/>
    <summary>Producing text images where text is both readable and controllable while faithfully matching a target visual style is a challenging problem. It has broad applications ranging from synthetic handwritten text generation to graphic design. In these settings, you need more than plausible images; you need precise control over both text content and visual fidelity. This is where Eruku1 stands out.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="GenAI" label="GenAI"/>
    <category term="HPC" label="HPC"/>
    <category term="Multimodal" label="Multimodal"/>
    <published>2026-04-24T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/software-tools-optimization/primus-projection/README.html</id>
    <title>Primus Projection: Estimate Memory and Performance Before You Train</title>
    <updated>2026-04-24T00:00:00+00:00</updated>
    <author>
      <name>Zhenyu Gu</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;Error parsing meta tag attribute “keywords”: No content.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/software-tools-optimization/primus-projection/README.html"/>
    <summary>Error parsing meta tag attribute “keywords”: No content.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="LLM" label="LLM"/>
    <category term="Optimization" label="Optimization"/>
    <published>2026-04-24T00:00:00+00:00</published>
  </entry>
  <entry>
    <id>https://rocm.blogs.amd.com/software-tools-optimization/flydsl-nightly-wheel/README.html</id>
    <title>Getting Started with FlyDSL Nightly Wheels on ROCm</title>
    <updated>2026-04-20T00:00:00+00:00</updated>
    <author>
      <name>Emad Barsoum</name>
    </author>
    <content type="html">&lt;p class="ablog-post-excerpt"&gt;&lt;p&gt;In the previous post on &lt;a class="reference external" href="https://rocm.blogs.amd.com/software-tools-optimization/flydsl-python-native/README.html"&gt;FlyDSL&lt;/a&gt;, we introduced the motivation behind FlyDSL and how it enables &lt;strong&gt;Python-native GPU kernel development&lt;/strong&gt; using the AMD ROCm™ software stack. FlyDSL combines the flexibility of Python with the performance of MLIR and LLVM-based compilation, allowing developers to write GPU kernels in Python while targeting modern AMD hardware.&lt;/p&gt;
&lt;/p&gt;
</content>
    <link href="https://rocm.blogs.amd.com/software-tools-optimization/flydsl-nightly-wheel/README.html"/>
    <summary>In the previous post on FlyDSL, we introduced the motivation behind FlyDSL and how it enables Python-native GPU kernel development using the AMD ROCm™ software stack. FlyDSL combines the flexibility of Python with the performance of MLIR and LLVM-based compilation, allowing developers to write GPU kernels in Python while targeting modern AMD hardware.</summary>
    <category term="AI/ML" label="AI/ML"/>
    <category term="Compiler" label="Compiler"/>
    <category term="HPC" label="HPC"/>
    <category term="Optimization" label="Optimization"/>
    <category term="Performance" label="Performance"/>
    <published>2026-04-20T00:00:00+00:00</published>
  </entry>
</feed>
