Posts by Zhenhua Liu

Athena-PRM: Enhancing Multimodal Reasoning with Data-Efficient Process Reward Models

12 January 2026

This blog introduces Athena-PRM, a multimodal Process Reward Model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. To efficiently generate high-quality process-labeled data, we leverage prediction consistency between weak and strong completers as a criterion for identifying reliable process labels. We also develop two effective strategies to improve the performance of PRMs: ORM initialization and up-sampling for negative data.

Read more ...

Introducing AMD EVLM: Efficient Vision-Language Models with Parameter-Space Visual Conditioning

22 August 2025

This blog introduces a novel and computationally efficient paradigm for Vision-Language Models (VLMs), which diverges from the conventional method of prepending visual tokens to textual input. Instead of elongating the input sequence, this approach injects visual information directly into the Large Language Model’s (LLM) parameters. It achieves this by using a vision encoder to extract image features and then employing a perceptual weight generator to transform these features into dynamic, low-rank adapter weights. These weights are temporarily integrated with the LLM’s parameters, effectively conditioning the model on the image without increasing the input length. This mechanism allows the model to achieve performance comparable to traditional VLMs on standard benchmarks while significantly reducing computational costs during inference.

Read more ...