Posts tagged C++
Matrix Core Programming on AMD CDNA™3 and CDNA™4 architecture
- 30 September 2025
In this blog post, we walk through how to use Matrix Cores in HIP kernels, with a focus on low-precision data types such as FP16, FP8, and FP4, as well as the new family of Matrix Core instructions with exponent block scaling introduced in the AMD CDNA™4 architecture. Through code examples and illustrations, we provide the necessary knowledge to start programming Matrix Cores, covering modern low-precision floating-point types, the Matrix Core compiler intrinsics, and the data layouts required by the Matrix Core instructions.
Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration
- 09 September 2025
Llama.cpp is an open source implementation of a Large Language Model (LLM) inference framework designed to run efficiently on diverse hardware configurations, both locally and in cloud environments. Its plain C/C++ implementation ensures a dependency-free setup, allowing it to seamlessly support various hardware architectures across CPUs and GPUs. The framework offers a range of quantization options, including 1.5-bit to 8-bit integer quantization, to achieve faster inference and reduced memory usage. Llama.cpp is part of an active open-source community within the AI ecosystem, with over 1200 contributors and almost 4000 releases on its official GitHub repository as of early August, 2025. Designed as a CPU-first C++ library, llama.cpp offers simplicity and easy integration with other programming environments - making it widely compatible and rapidly adopted across diverse platforms, especially among consumer devices.
ROCm Revisited: Getting Started with HIP
- 06 June 2025
This blog is part of our ROCm Revisited series[1]. The purpose of this series is to share the story of ROCm and our journey through the changes and successes we’ve achieved over the past few years.
C++17 parallel algorithms and HIPSTDPAR
- 18 April 2024
The C++17 standard added the concept of parallel algorithms to the
pre-existing C++ Standard Library. The parallel version of algorithms like
std::transform
maintain the same signature as the regular serial version,
except for the addition of an extra parameter specifying the
execution policy
to use. This flexibility allows users that are already
using the C++ Standard Library algorithms to take advantage of multi-core
architectures by just introducing minimal changes to their code.