Posts tagged Multimodal
Instella-VL-1B: First AMD Vision Language Model
- 07 March 2025
As part of AMD’s newly released Instella family we are thrilled to introduce Instella-VL-1B, the first AMD vision language model for image understanding trained on AMD Instinct™ MI300X GPUs. Our journey with Instella-VL builds upon our previous 1-billion-parameter language models, AMD OLMo SFT. We further extend the language model’s visual understanding abilities by connecting it with a vision encoder (which is initialized from CLIP ViT-L/14-336). During training, we jointly finetune vision encoder and language model with vision-language data in three stages: Alignment, Pretraining and Supervised-Finetuning (SFT).
Multimodal (Visual and Language) understanding with LLaVA-NeXT
- 26 April 2024
26, Apr 2024 by Phillip Dang.
Interacting with Contrastive Language-Image Pre-Training (CLIP) model on AMD GPU
- 16 April 2024
16, Apr 2024 by Sean Song.