Close Menu
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
What's Hot

The 21-year-old MIT Dropout raises $32 million at a $300 million valuation led by Insight

Donald Trump accuses Barack Obama of “treason” over 2016 election claims | Donald Trump News

Warehouse automation: Building a data-driven business

Facebook X (Twitter) Instagram
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
Home » Vision language models gain clearer vision through synthetic training data
Electronics & Semiconductor

Vision language models gain clearer vision through synthetic training data

ThefuturedatainsightsBy ThefuturedatainsightsJuly 22, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


AI vision, reinvention: the power of synthetic data

Cosyn works by leveraging the language skills of open source AI models to create training data for other AI models and learning how to read complex, text-rich images. Credit: Yue Yang

In the race to develop AI that understands complex images such as financial forecasts, medical diagrams, nutrition labels, etc., AI works independently in everyday settings, but closed source systems such as ChatGPT and Claude are currently setting the pace. But no one outside the manufacturer knows how those models were trained or what data they used.

Now, Penn Engineering researchers and the Allen Institute of AI (AI2) have developed a new approach to training open source models. Create scientists, charts, and tables using AI to teach other AI systems how to interpret complex visual information.

Those tools, Cosyn (short for code-induced synthesis), tap on the coding skills of open source AI models to render text-rich images, generate relevant questions and answers, and provide the data you need to learn and understand how to “see” scientists to other AI systems.

As researchers detail the ACL 2025 paper, Cosyn-trained models, one of the world’s leading AI conferences, match or excel with their own peers.

“It’s like taking a student who’s good at writing and asking someone to teach them how to draw,” says Yue Yang (Greng’25), a former AI2 co-author and research scientist: The Perceptual Inference and Interaction Research Group. “Essentially, it shifts the strengths of open source AI from text to vision.”

Composite images, actual results

The resulting dataset, called COSYN-400K, contains over 400,000 composite images and 2.7 million sets of corresponding instructions in a variety of categories, including scientific charts, chemical structures, and user interface screenshots. The COSYN-trained model surpassed the best proprietary systems like the GPT-4V and Gemini 1.5, flashing with a suite of seven benchmark tests.

In a particularly impressive case, the researchers synthetically generated just 7,000 nutritional labels and trained a model of nutritional Qa, a new benchmark model they created. That small, targeted dataset allowed the model to beat others trained with millions of real images.

“Training AI at COSYN is extremely efficient,” said Mark Yatskar, assistant professor at CIS and co-advisor for Yang’s doctoral program. “We show that synthetic data helps models generalize to real-world scenarios that may be unique to a person’s needs, such as reading nutritional labels for people with low vision.”

Yue Yang demonstrates Cosyn’s capabilities and uses models trained with synthetic data created in Cosyn to read nutrition labels and solve mathematical problems. Credit: Sylvia Zhang

Scaling and diversifying datasets

Creating hundreds of thousands of useful and diverse training examples has created unique challenges.

To reach the scale you need, co-author Ajay Patel, a doctoral student in Computer and Information Science (CIS), developed a software library called DataDreamer, which automates the entire data generation process. This allowed the team to promote language models in parallel, allowing for large-scale production of composite images and instructions.

To avoid repetition, the team used “Persona” to utilize short character profiles such as “sci-fi novelist” and “chemistry teacher” to guide AI responses and shape the content and tone of each example. Embed these personas in the prompts, CoSyn now generates richer and more diverse training data across a wide range of domains.

“AI models tend to repeat themselves unless they’re fine-tuned to different perspectives,” explains Patel. “Persona gives us a scalable way to do that, and the results speak for themselves.”

Leveling open source AI stadiums

By fully building open source tools, researchers hope to democratize access to powerful vision language training methods without the ethical and legal challenges surrounding web scraping and copyrighted content.

“This is a step towards AI that helps us make new scientific discoveries,” adds Chris Callison-Burch, a CIS professor who co-adapted Yang and is now advising Patel. “It opens the door to an AI system that allows us to reason about scientific documents and can help a wide range of people, from university students to researchers.”

From understanding to action

The team released the complete COSYN code and dataset to the public, and invited the global research community to build on their work.

Yang already pioneered the synthetic data that helps AI not only understand images but interact with them, acting as an intelligent digital agent that can click buttons, fill out forms and assist users with daily tasks.

“In the long run, we want AI that can not only explain it, but also act in the world,” Yang says. “This is one way of teaching that.”

Details: Scaling Text-rich Image Understanding via Code-Generated Synthesis Multimodal Data Generation, Yueyang1996.github.io/papers/cosyn.pdf

Provided by the University of Pennsylvania

Quote: AI Vision, Reinvention: Vision Language Models acquire clearer vision through synthetic training data (2025, July 21st). July 22, 2025 https://techxplore.com/news/2025-07-ai-vision-vision-reinvented-language-gain.html

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAI comes to California’s electric grid
Next Article The biggest US power sales to glimpse the costs for AI consumers – Energy news, top headlines, commentary, features, events
Thefuturedatainsights
  • Website

Related Posts

See real-time how AI is restructuring its work

July 22, 2025

Why humans are better at recognizing objects from fragments while AI is struggling

July 22, 2025

Ai’ Investigating thoughts reveals that models use tree-like mathematics to track shift information

July 22, 2025
Leave A Reply Cancel Reply

Latest Posts

Dairy Entrepreneurs Return to Shape the Future of British Dairy Products

Returning to British Vegetables or Derailed NHS Health Plans, Growers Warn

Welsh farmers urged plastics to be removed as new recycling tests begin

NFU warns that inheritance tax will break the backbone of UK agriculture

Latest Posts

Fund Managers conclude their position in Europe’s defense

July 21, 2025

10 Things to Do on the Right Path for Stocks as Another Tariff Deadline approaches

July 21, 2025

Why Delta and United are pulling away from airline packs

July 18, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • The 21-year-old MIT Dropout raises $32 million at a $300 million valuation led by Insight
  • Donald Trump accuses Barack Obama of “treason” over 2016 election claims | Donald Trump News
  • Warehouse automation: Building a data-driven business
  • Robot Guard Dog Helps Asilon Raise a $26 million Series
  • Trump sets 19% tariff on Philippines in new trade deal | International Trade News

Recent Comments

No comments to show.

Welcome to USA Business Watch – your trusted source for real-time insights, in-depth analysis, and industry trends across the American and global business landscape.

At USABusinessWatch.com, we aim to inform decision-makers, professionals, entrepreneurs, and curious minds with credible news and expert commentary across key sectors that shape the economy and society.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Archives

  • July 2025
  • June 2025
  • March 2022
  • January 2021

Categories

  • Aerospace & Defense
  • Agriculture
  • Automation & Process Control
  • Automotive & Transportation
  • Banking & Finance
  • Chemicals & Materials
  • Consumer Goods & Services
  • Economy
  • Economy
  • Electronics & Semiconductor
  • Energy & Resources
  • Food & Beverage
  • Hospitality & Tourism
  • Information Technology
  • Political
Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 usabusinesswatch. Designed by usabusinesswatch.

Type above and press Enter to search. Press Esc to cancel.