Close Menu
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
What's Hot

The $5 million trump card faces legal challenges, limited markets

Yes, Victoria’s efforts to separate the household from the gas have been dialed. But that’s still a real progress

Meta’s Recruit Blitz claims three Openai researchers

Facebook X (Twitter) Instagram
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
Home » AI generates data to help embodied agents ground language to 3D world
Electronics & Semiconductor

AI generates data to help embodied agents ground language to 3D world

ThefuturedatainsightsBy ThefuturedatainsightsJune 18, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


AI generates data to help embodied agents ground language to 3D world
A new 3D-text dataset, 3D-GRAND, leverages generative AI to create synthetic rooms that are automatically annotated with 3D structures. The dataset’s 40,087 household scenes can help train embodied AI, like household robots, connect language to 3D spaces. Credit: Joyce Chai

A new, densely annotated 3D-text dataset called 3D-GRAND can help train embodied AI, like household robots, to connect language to 3D spaces. The study, led by University of Michigan researchers, was presented at the Computer Vision and Pattern Recognition (CVPR) Conference in Nashville, Tennessee on June 15, and published on the arXiv preprint server.

When put to the test against previous 3D datasets, the model trained on 3D-GRAND reached 38% grounding accuracy, surpassing the previous best model by 7.7%. 3D-GRAND also drastically reduced hallucinations to only 6.67% from the previous state-of-the-art rate of 48%.

The dataset contributes to the next generation of household robots that will far exceed the robotic vacuums that currently populate homes. Before we can command a robot to “pick up the book next to the lamp on the nightstand and bring it to me,” the robot must be trained to understand what language refers to in space.

“Large multimodal language models are mostly trained on text with 2D images, but we live in a 3D world. If we want a robot to interact with us, it must understand spatial terms and perspectives, interpret object orientations in space, and ground language in the rich 3D environment,” said Joyce Chai, a professor of computer science and engineering at U-M and senior author of the study.

While text or image-based AI models can pull an enormous amount of information from the internet, 3D data is scarce. It’s even harder to find 3D data with grounded text data—meaning specific words like “sofa” are linked to 3D coordinates bounding the actual sofa.

Like all LLMs, 3D-LLMs perform best when trained on large data sets. However, building a large dataset by imaging rooms with cameras would be time-intensive and expensive as annotators must manually specify objects and their spatial relationships and link words to their corresponding objects.

The research team took a new approach, leveraging generative AI to create synthetic rooms that are automatically annotated with 3D structures. The resulting 3D-GRAND dataset includes 40,087 household scenes paired with 6.2 million densely-grounded descriptions of the room.

“A big advantage of synthetic data is that labels come for free because you already know where the sofa is, which makes the curation process easier,” said Jianing Jed Yang, a doctoral student of computer science and engineering at U-M and lead author of the study.

After generating the synthetic 3D data, an AI pipeline first used vision models to describe each object’s color, shape and material. From here, a text-only model generated descriptions of entire scenes while using scene graphs—structured maps of how objects relate to each other—to ensure each noun phrase is grounded to specific 3D objects.

A final quality control step used a hallucination filter to ensure each object generated in the text actually has an associated object in the 3D scene.

Human evaluators spot-checked 10,200 room-annotation pairs to ensure reliability by assessing whether there were any inaccuracies in AI-generated sentences or objects. The synthetic annotations had a low error rate of about 5% to 8%, which is comparable to professional human annotations.

“Given the size of the dataset, the LLM-based annotation reduces both the cost and time by an order of magnitude compared to human annotation, creating 6.2 million annotations in just two days. It is widely recognized that collecting high-quality data at scale is essential for building effective AI models,” said Yang.

To put the new dataset to the test, the research team trained a model on 3D-GRAND and compared it with three baseline models (3D-LLM, LEO and 3D-VISTA). The benchmark ScanRefer evaluated grounding accuracy—how much overlap the predicted bounding box overlaps with the true object boundary—while a newly introduced benchmark called 3D-POPE evaluated object hallucinations.

The model trained on 3D-GRAND reached a 38% grounding accuracy with only a 6.67% hallucination rate, far exceeding the competing generative models. While 3D-GRAND contributes to the 3D-LLM modeling community, testing on robots will be the next step.

“It will be exciting to see how 3D-GRAND helps robots better understand space and take on different spatial perspectives, potentially improving how they communicate and collaborate with humans,” said Chai.

More information:
Jianing Yang et al, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination, arXiv (2024). DOI: 10.48550/arxiv.2406.05132

Journal information:
arXiv

Provided by
University of Michigan College of Engineering

Citation:
AI generates data to help embodied agents ground language to 3D world (2025, June 16)
retrieved 18 June 2025
from https://techxplore.com/news/2025-06-ai-generates-embodied-agents-ground.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleStellantis pivots to Google’s Android as in-car partnership with Amazon ends
Next Article Plains All American Executes Definitive Agreements for $3.75 Billion Sale of NGL Business to Keyera – Energy News, Top Headlines, Commentaries, Features & Events
Thefuturedatainsights
  • Website

Related Posts

Yes, Victoria’s efforts to separate the household from the gas have been dialed. But that’s still a real progress

June 26, 2025

Exploring the generalization of skills with extra robot arms for motor enhancement

June 26, 2025

New components reduce costs and supply chain constraints for fast-charged EV batteries

June 26, 2025
Leave A Reply Cancel Reply

Latest Posts

New Camera Tech transforms cattle care at athletics colleges

Environment Agency issues a water shortage warning for farmers

Farmer fined £40,000 after Potato Field Fluds Road and Polluted River

Farmers launch legal battle over changes to the inheritance of “family farm tax”

Latest Posts

NATO’s 5% spending target could be the peak for some defense stocks: City

June 26, 2025

Southwest Airport Lounge? According to the CEO, the careers are open to high-end changes

June 25, 2025

NATO allies agree to a high 5% defence spending target

June 25, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • The $5 million trump card faces legal challenges, limited markets
  • Yes, Victoria’s efforts to separate the household from the gas have been dialed. But that’s still a real progress
  • Meta’s Recruit Blitz claims three Openai researchers
  • Elon Musk reportedly fires Tesla’s top sales executive
  • Nike Q4 2025 revenue

Recent Comments

No comments to show.

Welcome to USA Business Watch – your trusted source for real-time insights, in-depth analysis, and industry trends across the American and global business landscape.

At USABusinessWatch.com, we aim to inform decision-makers, professionals, entrepreneurs, and curious minds with credible news and expert commentary across key sectors that shape the economy and society.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Archives

  • June 2025
  • March 2022
  • January 2021

Categories

  • Aerospace & Defense
  • Agriculture
  • Automation & Process Control
  • Automotive & Transportation
  • Banking & Finance
  • Chemicals & Materials
  • Consumer Goods & Services
  • Economy
  • Economy
  • Electronics & Semiconductor
  • Energy & Resources
  • Food & Beverage
  • Hospitality & Tourism
  • Information Technology
  • Political
Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 usabusinesswatch. Designed by usabusinesswatch.

Type above and press Enter to search. Press Esc to cancel.