Close Menu
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
What's Hot

Struggling to get by: Behind the US underemployment crisis | Unemployment News

Upside Robotics reduces fertilizer use and waste in corn crops

McDonald’s (MCD) 2025 Q4 Earnings

Facebook X (Twitter) Instagram
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
  • Home
  • About Us
  • Market Research Reports and Company
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
Home » New metric tracks where multimodal reasoning models go wrong
Electronics & Semiconductor

New metric tracks where multimodal reasoning models go wrong

Bussiness InsightsBy Bussiness InsightsJune 15, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


A new metric and a diagnostic benchmark to study the hallucinations of multimodal reasoning models
(a) Example of outputs from a reasoning model and a non-reasoning model on a perception task. Red highlights indicate visual hallucination. Multimodal reasoning models are generally more prone to amplifying hallucinations during the reasoning process compared to their non-reasoning counterparts. (b) Performance of different models on reasoning and perception tasks in the RH-Bench dataset. Better performing models are positioned in the upper right corner. Baseline non-reasoning models of varying scales typically exhibit weaker reasoning capabilities and fewer hallucination, whereas reasoning models display the opposite trend. Credit: Liu et al.

Over the past decades, computer scientists have introduced increasingly sophisticated machine learning-based models, which can perform remarkably well on various tasks. These include multimodal large language models (MLLMs), systems that can process and generate different types of data, predominantly texts, images and videos.

Some of these models, such as OpenAI’s GPT4 with Vision (GPT-4V), DeepSeek-R1 and Google Gemini, are now widely used by users worldwide to create specific multi-modal content, including images for social media posts or articles, as well as texts tailored for specific uses.

While the reasoning abilities of these models have improved considerably in recent years, allowing them to solve mathematical and reasoning problems, studies showed that they sometimes respond to things that are not grounded in the input data, for instance, by describing details that do not actually exist in an input image.

These hallucinations have been linked to language priors and internal biases that a model may have acquired during training while it was analyzing large text datasets. These biases can override the visual information fed to the model (i.e., input images), causing the model to incorrectly complete the tasks assigned to it.

Researchers at UC Santa Cruz, Stanford University and UC Santa Barbara have recently developed a metric and a diagnostic benchmark that could help to study these hallucinations, specifically focusing on the relationship between the reasoning of MLLMs and their tendency to hallucinate when asked to describe what is portrayed in an input image. These new research tools, presented in a paper on the arXiv preprint server, could contribute to the assessment and advancement of MLLMs.

“Test-time compute has empowered multimodal large language models to generate extended reasoning chains, yielding strong performance on tasks such as multimodal math reasoning,” wrote Chengzhi Liu, Zhongxing Xu and their colleagues in their paper.

“However, this improved reasoning ability often comes with increased hallucination: as generations become longer, models tend to drift away from image-grounded content and rely more heavily on language priors.”

A new metric and a diagnostic benchmark to study the hallucinations of multimodal reasoning models
Comparison of reasoning and non-reasoning models on five perception benchmarks. Results are shown for 3B models (left) and 7B models (right). Higher scores indicate lower hallucination. Credit: arXiv (2025). DOI: 10.48550/arxiv.2505.21523

The researchers first assessed the performance of MLLMs on complex reasoning tasks and found that as reasoning chains (i.e., sequences of logical steps required to solve a problem) grew in length, the models’ tendency to hallucinate also increased. They suggested that these hallucinations emerged due to reduced attention to visual stimuli and a greater reliance on language priors.

“Attention analysis shows that longer reasoning chains lead to reduced focus on visual inputs, which contributes to hallucination,” wrote Liu, Xu and their colleagues.

“To systematically study this phenomenon, we introduce RH-AUC, a metric that quantifies how a model’s perception accuracy changes with reasoning length, allowing us to evaluate whether the model preserves visual grounding during reasoning. We also release RH-Bench, a diagnostic benchmark that spans a variety of multimodal tasks, designed to assess the trade-off between reasoning ability and hallucination.”

RH-AUC and RH-Bench, the metrics and benchmarks developed by Liu, Xu and his colleagues, could soon be used by other researchers to evaluate the interplay between the reasoning abilities of specific MLLMs and the risk of hallucinating. Moreover, the observations presented in the team’s paper could guide future efforts aimed at developing models that can reliably tackle complex reasoning tasks without becoming prone to hallucinations.

“Our analysis reveals that larger models typically achieve a better balance between reasoning and perception and that this balance is influenced more by the types and domains of training data than by its overall volume,” wrote Liu, Xu and their colleagues. “These findings underscore the importance of evaluation frameworks that jointly consider both reasoning quality and perceptual fidelity.”

Written for you by our author Ingrid Fadelli, edited by Gaby Clark, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You’ll get an ad-free account as a thank-you.

More information:
Chengzhi Liu et al, More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models, arXiv (2025). DOI: 10.48550/arxiv.2505.21523

Journal information:
arXiv

© 2025 Science X Network

Citation:
Benchmarking hallucinations: New metric tracks where multimodal reasoning models go wrong (2025, June 14)
retrieved 14 June 2025
from https://techxplore.com/news/2025-06-benchmarking-hallucinations-metric-tracks-multimodal.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleNew York passes a bill to prevent AI-fueled disasters
Next Article Chevron and Halliburton Enable Intelligent Hydraulic Fracturing – Energy News, Top Headlines, Commentaries, Features & Events
Bussiness Insights
  • Website

Related Posts

Dual-mode design improves accuracy of MEMS accelerometers, study finds

November 18, 2025

Researchers complete first real-world validation of maritime IoT communications network

November 18, 2025

Plasma-based method creates efficient, low-cost catalyst for metal-air batteries

November 18, 2025
Leave A Reply Cancel Reply

Latest Posts

Supreme Court bans Oatly from using ‘milk’ in UK branding dispute

New research supported by Defra aims to improve tenant farming relationships

UK secures 157 new solar power schemes amid concerns over land use priorities

Pig producers dispute BBC claims over four-year farrowing box ban

Latest Posts

FAA abruptly lifts order suspending operations at El Paso Airport for 10 days

February 11, 2026

Hanwha Aerospace, South Korea’s largest defense company, falls 6%

February 10, 2026

Elon Musk on his way to becoming the world’s first millionaire with SpaceX-xAI

February 7, 2026

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Struggling to get by: Behind the US underemployment crisis | Unemployment News
  • Upside Robotics reduces fertilizer use and waste in corn crops
  • McDonald’s (MCD) 2025 Q4 Earnings
  • Integrate raises $17 million to move defense project management into the 21st century
  • Novo Nordisk faces defining year in the obesity drug market

Recent Comments

  1. Numbersjed on 100% tariffs on Trump’s drugs: What we know | Donald Trump News
  2. JamesPak on Hundreds gather in Barcelona to protest overtourism in southern Europe
  3. vibroanalizador on 100% tariffs on Trump’s drugs: What we know | Donald Trump News
  4. игровой аппарат гейтс оф олимпус on 100% tariffs on Trump’s drugs: What we know | Donald Trump News
  5. online casino games slots on 100% tariffs on Trump’s drugs: What we know | Donald Trump News

Welcome to USA Business Watch – your trusted source for real-time insights, in-depth analysis, and industry trends across the American and global business landscape.

At USABusinessWatch.com, we aim to inform decision-makers, professionals, entrepreneurs, and curious minds with credible news and expert commentary across key sectors that shape the economy and society.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Archives

  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • March 2022
  • January 2021

Categories

  • Aerospace & Defense
  • Agriculture
  • Automation & Process Control
  • Automotive & Transportation
  • Banking & Finance
  • Chemicals & Materials
  • Consumer Goods & Services
  • Economy
  • Economy
  • Electronics & Semiconductor
  • Energy & Resources
  • Food & Beverage
  • Hospitality & Tourism
  • Information Technology
  • Political
Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Market Research Reports and Company
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 usabusinesswatch. Designed by usabusinesswatch.

Type above and press Enter to search. Press Esc to cancel.