Close Menu
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
What's Hot

Lucid Air owners can use Tesla Superchargers starting July 31st

NFU warns that inheritance tax will break the backbone of UK agriculture

Credit Card Startup Imprint Hits Big Bank for Rakuten’s Co-Branded Transactions

Facebook X (Twitter) Instagram
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
Home » Vision language models gain clearer vision through synthetic training data
Electronics & Semiconductor

Vision language models gain clearer vision through synthetic training data

ThefuturedatainsightsBy ThefuturedatainsightsJuly 22, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


AI vision, reinvention: the power of synthetic data

Cosyn works by leveraging the language skills of open source AI models to create training data for other AI models and learning how to read complex, text-rich images. Credit: Yue Yang

In the race to develop AI that understands complex images such as financial forecasts, medical diagrams, nutrition labels, etc., AI works independently in everyday settings, but closed source systems such as ChatGPT and Claude are currently setting the pace. But no one outside the manufacturer knows how those models were trained or what data they used.

Now, Penn Engineering researchers and the Allen Institute of AI (AI2) have developed a new approach to training open source models. Create scientists, charts, and tables using AI to teach other AI systems how to interpret complex visual information.

Those tools, Cosyn (short for code-induced synthesis), tap on the coding skills of open source AI models to render text-rich images, generate relevant questions and answers, and provide the data you need to learn and understand how to “see” scientists to other AI systems.

As researchers detail the ACL 2025 paper, Cosyn-trained models, one of the world’s leading AI conferences, match or excel with their own peers.

“It’s like taking a student who’s good at writing and asking someone to teach them how to draw,” says Yue Yang (Greng’25), a former AI2 co-author and research scientist: The Perceptual Inference and Interaction Research Group. “Essentially, it shifts the strengths of open source AI from text to vision.”

Composite images, actual results

The resulting dataset, called COSYN-400K, contains over 400,000 composite images and 2.7 million sets of corresponding instructions in a variety of categories, including scientific charts, chemical structures, and user interface screenshots. The COSYN-trained model surpassed the best proprietary systems like the GPT-4V and Gemini 1.5, flashing with a suite of seven benchmark tests.

In a particularly impressive case, the researchers synthetically generated just 7,000 nutritional labels and trained a model of nutritional Qa, a new benchmark model they created. That small, targeted dataset allowed the model to beat others trained with millions of real images.

“Training AI at COSYN is extremely efficient,” said Mark Yatskar, assistant professor at CIS and co-advisor for Yang’s doctoral program. “We show that synthetic data helps models generalize to real-world scenarios that may be unique to a person’s needs, such as reading nutritional labels for people with low vision.”

Yue Yang demonstrates Cosyn’s capabilities and uses models trained with synthetic data created in Cosyn to read nutrition labels and solve mathematical problems. Credit: Sylvia Zhang

Scaling and diversifying datasets

Creating hundreds of thousands of useful and diverse training examples has created unique challenges.

To reach the scale you need, co-author Ajay Patel, a doctoral student in Computer and Information Science (CIS), developed a software library called DataDreamer, which automates the entire data generation process. This allowed the team to promote language models in parallel, allowing for large-scale production of composite images and instructions.

To avoid repetition, the team used “Persona” to utilize short character profiles such as “sci-fi novelist” and “chemistry teacher” to guide AI responses and shape the content and tone of each example. Embed these personas in the prompts, CoSyn now generates richer and more diverse training data across a wide range of domains.

“AI models tend to repeat themselves unless they’re fine-tuned to different perspectives,” explains Patel. “Persona gives us a scalable way to do that, and the results speak for themselves.”

Leveling open source AI stadiums

By fully building open source tools, researchers hope to democratize access to powerful vision language training methods without the ethical and legal challenges surrounding web scraping and copyrighted content.

“This is a step towards AI that helps us make new scientific discoveries,” adds Chris Callison-Burch, a CIS professor who co-adapted Yang and is now advising Patel. “It opens the door to an AI system that allows us to reason about scientific documents and can help a wide range of people, from university students to researchers.”

From understanding to action

The team released the complete COSYN code and dataset to the public, and invited the global research community to build on their work.

Yang already pioneered the synthetic data that helps AI not only understand images but interact with them, acting as an intelligent digital agent that can click buttons, fill out forms and assist users with daily tasks.

“In the long run, we want AI that can not only explain it, but also act in the world,” Yang says. “This is one way of teaching that.”

Details: Scaling Text-rich Image Understanding via Code-Generated Synthesis Multimodal Data Generation, Yueyang1996.github.io/papers/cosyn.pdf

Provided by the University of Pennsylvania

Quote: AI Vision, Reinvention: Vision Language Models acquire clearer vision through synthetic training data (2025, July 21st). July 22, 2025 https://techxplore.com/news/2025-07-ai-vision-vision-reinvented-language-gain.html

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAI comes to California’s electric grid
Next Article The biggest US power sales to glimpse the costs for AI consumers – Energy news, top headlines, commentary, features, events
Thefuturedatainsights
  • Website

Related Posts

Ai’ Investigating thoughts reveals that models use tree-like mathematics to track shift information

July 22, 2025

AI comes to California’s electric grid

July 22, 2025

AI models learn to split tasks and reduce latency for complex prompts

July 22, 2025
Leave A Reply Cancel Reply

Latest Posts

NFU warns that inheritance tax will break the backbone of UK agriculture

Red Tractors Help Outdoor Pigs Agricultural Overhaul After Industry Push

New Agriculture Weapons: AI Apps to Find Before You’re Attacking Potato Blight

Fresh hope for British meat and dairy exports is that the EU is still king

Latest Posts

Fund Managers conclude their position in Europe’s defense

July 21, 2025

10 Things to Do on the Right Path for Stocks as Another Tariff Deadline approaches

July 21, 2025

Why Delta and United are pulling away from airline packs

July 18, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Lucid Air owners can use Tesla Superchargers starting July 31st
  • NFU warns that inheritance tax will break the backbone of UK agriculture
  • Credit Card Startup Imprint Hits Big Bank for Rakuten’s Co-Branded Transactions
  • Coca-Cola for deploying cane sugar soda within us
  • Why Trump’s attacks on Jerome Powell are raising fears for the US economy | Donald Trump News

Recent Comments

No comments to show.

Welcome to USA Business Watch – your trusted source for real-time insights, in-depth analysis, and industry trends across the American and global business landscape.

At USABusinessWatch.com, we aim to inform decision-makers, professionals, entrepreneurs, and curious minds with credible news and expert commentary across key sectors that shape the economy and society.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Archives

  • July 2025
  • June 2025
  • March 2022
  • January 2021

Categories

  • Aerospace & Defense
  • Agriculture
  • Automation & Process Control
  • Automotive & Transportation
  • Banking & Finance
  • Chemicals & Materials
  • Consumer Goods & Services
  • Economy
  • Economy
  • Electronics & Semiconductor
  • Energy & Resources
  • Food & Beverage
  • Hospitality & Tourism
  • Information Technology
  • Political
Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 usabusinesswatch. Designed by usabusinesswatch.

Type above and press Enter to search. Press Esc to cancel.