Close Menu
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
What's Hot

Greenhouse producers are targeting Trump’s “tomato tax” on Mexican imports

GM says EVS is the “North Star” when legacy car makers chase Tesla

Trump Team is seeking talks with Gislaine Maxwell amid Epstein’s pressure | Donald Trump News

Facebook X (Twitter) Instagram
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
Home » AI models learn to split tasks and reduce latency for complex prompts
Electronics & Semiconductor

AI models learn to split tasks and reduce latency for complex prompts

ThefuturedatainsightsBy ThefuturedatainsightsJuly 22, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Graph Chart String

Credit: Pixabay/CC0 Public Domain

As large-scale language models (LLMs) like ChatGPT continue to advance, user expectations continue to grow. This includes whether or not you can respond quickly to increasingly complex prompts.

Traditional LLM relies on the concept of “autoregression decoding” in which each item in a sequence (the “token”) is predicted based on previously generated output. This approach inevitably leads to more complex prompt delays, but researchers have sought to mitigate this with projects that utilize more effectively the parallelism of multi-core computer chips. For example, speculative decoding uses a fast draft model to propose tokens, which are then verified in parallel by slower, higher quality models.

Instead, methods in the new class exploit “semantic independence” to identify bullet-like syntax patterns, extending each in parallel. However, they rely on hand-made syntactic heuristics, which are fragile and often fail when the answer deviates from the expected form.

These drawbacks inspired MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and Google researchers, using a learning-based approach for parallel deciphering. Instead of relying on fixed rules, the method trains the LLM to recognize semantic independence. That is, it identifies and decodes semantically independent chunks of text in parallel.

Result: Pasta.

Specifically, the CSAIL team’s parallel structure annotations (pastas) allow LLM to generate text in parallel and dramatically accelerate response times. Unlike previous attempts that relied on hand-coded, rigid rules to identify independent text segments, Pasta teaches LLM to essentially understand and express these opportunities for parallelization within their own answers.

This approach is called a learned asynchronous decoding, and illustrates the shift towards education in models to coordinate their own parallel decoding strategies. The survey results are published on the ARXIV preprint server.

“Traditional LLM is like a single cook who makes lasagna one step at a time,” explained Tian Jin, the lead author of a new paper on the project presented at the International Conference on Machine Learning in Vancouver (ICML 2025). “Pasta teaches chefs to recognize whether different parts of the lasagna can be prepared simultaneously, such as mixing a subset of ingredients while the oven is preheating, leading to a much faster process overall.”

This innovation addresses the fundamental bottlenecks of LLM reasoning. Here, the continuous nature of decoding creates long latency for underutilized hardware and users. Current LLMs can take seconds or minutes to satisfy a user’s request. This is a question of waiting time that the pasta aims to solve.

At the heart of the pasta there are two main components: annotation languages that allow LLM to tag semantically independent parts of the response, and annotation languages that act on these tags to coordinate parallel decoding during inference. As Jin explains, Pasta Lang can be thought of as a set of instructions LLM writes on its own, marking sections of output that can be worked together. The interpreter then reads these instructions and manages the parallel generation of those sections.

The team trained LLMS to generate these pastarang annotations through a two-stage fine-tuning process. This training not only optimizes for decoding speed, but also maintains or improves the quality of the generated responses. This dual optimization is to make a significant progress as more training computing becomes available, as both speed and quality can be continuously improved.

In the experiments performed on pasta with the alpacaebal benchmark used, the team’s self-parallelization model achieved nearly twice the geometric mean speedup, experiencing only slight changes in response quality (a 7% decrease from 2% gain). This means that users can expect almost twice as fast as they do without significantly reducing accuracy or consistency.

“It was amazing to see this behavior of LLM organizing its own inference time behavior,” says Jin. “It was magically lit to see how throwing more computing with these algorithms would result in an increasingly sophisticated self-organizing behavior.”

This study highlights the key challenges in the field of balancing speed and quality. Previous methods such as Skeletal Framework (SOT) and APAR attempted parallel decoding by looking for manually specified syntactic structures such as bullet points and paragraphs. However, these methods were often strict and inaccurate, and we were unable to identify opportunities for parallelization if the answers deviated slightly from the expected pattern. In contrast, the learning-based approach to pasta offers a more robust and scalable solution.

“It’s about enhancing LLM getting smarter about how they generate content,” says PhD Jin. Csail student. “Instead of trying to guess where we can work in parallel, we are teaching LLM to identify the opportunity on the spot.”

Going forward, the team is optimistic about the broader implications of pasta. The ability to significantly reduce LLM decoding latency can reduce computational resource requirements, making these powerful AI models more accessible and affordable for a wider range of users and applications.

“We essentially designed the LLM protocol and optimized it itself,” Jin says. “By improving the effectiveness of LLM inference, pasta can significantly reduce computational resource requirements and improve LLM accessibility.”

Jin spearheaded the project with his two faculty advisors, MIT professors Michael Carbine and Jonathan Lagan Kelly. Other paper co-authors include Csail’s Ellie Y. Cheng and Zack Ankner, Google researchers Suvinay Subramanian, Nikunj Saunshi, Blake M. Elias, and Amir Yazdanbakhsh.

More details: Tian Jin et al., Learn to Keep Promises: Learned Asynchronous Decoding, Scaling Decoding Parallelism of Language Models using Arxiv (2025). doi:10.48550/arxiv.2502.11517

Journal Information: arxiv

Provided by Massachusetts Institute of Technology

Quote: AI Models learn to split tasks, reduce complex prompt latency (July 21, 2025) and retrieved on 22 July 2025 from https://techxplore.com/news/2025-07-ai-tasks-slashing-complex-plompts.html

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleSerial spyware founder Scott Zuckerman hopes FTC will free him from the surveillance industry
Next Article AI comes to California’s electric grid
Thefuturedatainsights
  • Website

Related Posts

See real-time how AI is restructuring its work

July 22, 2025

Why humans are better at recognizing objects from fragments while AI is struggling

July 22, 2025

Ai’ Investigating thoughts reveals that models use tree-like mathematics to track shift information

July 22, 2025
Leave A Reply Cancel Reply

Latest Posts

Dairy Entrepreneurs Return to Shape the Future of British Dairy Products

Returning to British Vegetables or Derailed NHS Health Plans, Growers Warn

Welsh farmers urged plastics to be removed as new recycling tests begin

NFU warns that inheritance tax will break the backbone of UK agriculture

Latest Posts

Fund Managers conclude their position in Europe’s defense

July 21, 2025

10 Things to Do on the Right Path for Stocks as Another Tariff Deadline approaches

July 21, 2025

Why Delta and United are pulling away from airline packs

July 18, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Greenhouse producers are targeting Trump’s “tomato tax” on Mexican imports
  • GM says EVS is the “North Star” when legacy car makers chase Tesla
  • Trump Team is seeking talks with Gislaine Maxwell amid Epstein’s pressure | Donald Trump News
  • Totalenergies and Emerson increase the value of industrial data
  • How Unilever marries investments in people and technology to increase manufacturing productivity

Recent Comments

No comments to show.

Welcome to USA Business Watch – your trusted source for real-time insights, in-depth analysis, and industry trends across the American and global business landscape.

At USABusinessWatch.com, we aim to inform decision-makers, professionals, entrepreneurs, and curious minds with credible news and expert commentary across key sectors that shape the economy and society.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Archives

  • July 2025
  • June 2025
  • March 2022
  • January 2021

Categories

  • Aerospace & Defense
  • Agriculture
  • Automation & Process Control
  • Automotive & Transportation
  • Banking & Finance
  • Chemicals & Materials
  • Consumer Goods & Services
  • Economy
  • Economy
  • Electronics & Semiconductor
  • Energy & Resources
  • Food & Beverage
  • Hospitality & Tourism
  • Information Technology
  • Political
Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 usabusinesswatch. Designed by usabusinesswatch.

Type above and press Enter to search. Press Esc to cancel.