Posts by Sudhanshu Ranjan

Introducing Instella-MoE: A State-of-the-Art Fully Open Mixture-of-Experts Language Model

24 July 2026

AMD is excited to introduce Instella-MoE, a state-of-the-art fully open Mixture-of-Experts (MoE) language model with 16 billion total parameters and 2.8 billion active parameters. Trained from scratch on AMD Instinct™ MI300X and MI325X GPUs with the AMD ROCm™ software stack, Instella-MoE combines a sparsely activated MoE design with architectural innovations such as Gated Multi-head Latent Attention (Gated MLA) and FarSkip-Collective connectivity. Instella-MoE delivers competitive performance across a broad suite of benchmarks against both dense and MoE baselines (as shown in Figure 1) including models with comparable or larger active parameter counts, establishing it as one of the strongest fully open language models at its scale.

Read more ...

LuminaSFT: Generating Synthetic Fine-Tuning Data for Small Language Models

24 February 2026

Small language models (SLMs) are emerging as a lightweight and cost-efficient alternative to large language models (LLMs). They significantly reduce inference costs and latency, and when carefully optimized for specific tasks, can approach—or even match—the performance of larger models. However, due to their limited parameter capacity, SLMs typically require stronger supervision to reach their full potential. Supervised fine-tuning (SFT) therefore plays a crucial role in enhancing their performance.

Read more ...

Introducing Instella-Math: Fully Open Language Model with Reasoning Capability

09 August 2025

AMD is thrilled to introduce Instella-Math, a reasoning-focused language model that marks a major milestone for AMD: as far as we know, it’s the first language model trained with long chain-of-thought reinforcement learning entirely on AMD GPUs. Starting from Instella-3B-Instruct, we extended the model’s capabilities through a multi-stage training pipeline—featuring two stages of supervised fine-tuning and three stages of reinforcement learning using the VERL framework —executed entirely on AMD Instinct™ MI300X GPUs. This blog offers an inside look at the training process and highlights Instella-Math’s performance on challenging reasoning benchmarks, demonstrating the strength of both the model and the hardware behind it.

Read more ...

Introducing Instella-Long: A Fully Open Language Model with Long-Context Capability

11 June 2025

AMD is excited to announce Instella-Long, a long-context language model continually trained from Instella-3B-Instruct on AMD Instinct™ MI300X GPUs. To our knowledge, Instella-Long makes Instella series the first fully open language model trained from scratch that supports long-context. Instella-Long can support 128K context length and achieve competitive performance outperforming open-weights models such as Phi-3.5-mini [1], Gemma-3-4B [2], and Qwen2.5-3B [3] on the long-context benchmark.

Read more ...

Instella-VL-1B: First AMD Vision Language Model

07 March 2025

As part of AMD’s newly released Instella family we are thrilled to introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on AMD Instinct™ MI300X GPUs. Our journey with Instella-VL builds upon our previous 1-billion-parameter language models, AMD OLMo SFT. We further extend the language model’s visual understanding abilities by connecting it with a vision encoder (which is initialized from CLIP ViT-L/14-336). During training, we jointly finetune vision encoder and language model with vision-language data in three stages: Alignment, Pretraining and Supervised-Finetuning (SFT).

Read more ...

Introducing Instella: New State-of-the-art Fully Open 3B Language Models

05 March 2025

AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs) trained from scratch on AMD Instinct™ MI300X GPUs. Instella models outperform existing fully open models of similar sizes and achieve competitive performance compared to state-of-the-art open-weight models such as Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B, including their instruction-tuned counterparts.

Read more ...