Close Menu
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
What's Hot

Elon Musk reportedly fires Tesla’s top sales executive

Nike Q4 2025 revenue

Israeli-Iran conflict exposed China’s “limited leverage,” analysts say | Israeli-Iran conflict news

Facebook X (Twitter) Instagram
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
  • Home
  • Aerospace & Defense
    • Automation & Process Control
      • Automotive & Transportation
  • Banking & Finance
    • Chemicals & Materials
    • Consumer Goods & Services
  • Economy
    • Electronics & Semiconductor
  • Energy & Resources
    • Food & Beverage
    • Hospitality & Tourism
    • Information Technology
  • Agriculture
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
USA Business Watch – Insightful News on Economy, Finance, Politics & Industry
Home » How LLM architecture and training data form AI position bias
Electronics & Semiconductor

How LLM architecture and training data form AI position bias

ThefuturedatainsightsBy ThefuturedatainsightsJune 25, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Unpack large language model bias

Three types of attention masks and the corresponding indication graph G used in the analysis (self-loops are omitted for clarity). The directional edge from token J to me indicates that I am attending j. The center node highlighted in yellow (Definition 3.1) represents a token that can be directly or indirectly correspond to by all other tokens in the sequence. As depicted in the top line, the graph theory formulation captures both the direct and indirect contributions of tokens to the overall context, providing a comprehensive view of token interactions under multilayered attention. Credit: Arxiv (2025). doi:10.48550/arxiv.2502.01951

Research shows that large-scale language models (LLMs) tend to overemphasize information at the beginning and end of a document or conversation, ignoring the center.

This “position bias” means that if your lawyer uses an LLM-driven virtual assistant to obtain a specific phrase in a 30-page affidavit, LLM is more likely to find the correct text if it is on the initial or final page.

Researchers at MIT have discovered the mechanism behind this phenomenon.

They created a theoretical framework to study how information flows through the machine learning architecture that forms the backbone of LLM. They found that certain design choices that control how the model processes input data can cause position bias.

Their experiments revealed that influences model architecture, particularly the architecture of the model, and in particular the way it spreads to input words within the model, generates or enhances position bias, and training data also contributes to the problem.

This work is published on the ARXIV preprint server.

In addition to identifying the origin of position bias, those frameworks can be used to diagnose and modify them in future model designs.

This could lead to more reliable chatbots that maintain topics during long conversations, medical AI systems that infer more equitable when processing trobes of patient data, and code assistants that pay close attention to every part of the program.

“These models are black boxes, so as an LLM user, you probably don’t know that position bias can contradict the model. You just give it your document in the order you want and expect it to work. But by a better understanding of the underlying mechanisms of these black box models, you can improve them by addressing these limitations,” decision-making system (the lid), and the first author of the paper.

Her co-authors include Yifei Wang from MIT Postdoc. Additionally, senior author Stefanie Jegelka is an associate professor of Electrical Engineering and Computer Science (EECS) and a member of the IDSS and the Institute of Computer Science and Artificial Intelligence (CSAIL). Ali Jadbabaye is a professor in the Faculty of Civil and Environmental Engineering, a core professor of IDSS and lead researcher of lids. This research will be presented at an international conference on machine learning.

Analyze attention

LLMs such as Claude, Llama, and GPT-4 carry types of neural network architectures known as transformers. Transformers are designed to process sequential data, encode tenks into chunks called tokens, and learn the relationships between tokens and predict the next word.

These models are very good at this as they use context nodes using interconnected layers of data processing nodes by allowing tokens to selectively focus or focus attention on the relevant tokens.

But if every token can pay attention to every other token in a 30-page document, it will soon become computationally out of hand. Therefore, when engineers build trans models, they often use caution masking techniques that limit the words that a token can attend. For example, a causal mask allows you to pay attention only to what the word came before it.

Engineers also use position encoding to help the model understand the position of each word in a sentence and improve performance.

MIT researchers have built a graph-based theoretical framework to explore how these modelling choices, attention masks, and location encodings affect position bias.

“It’s extremely difficult to study because everything is combined and intertwined within a attention mechanism. Graphs are a flexible language for explaining the dependencies between words within a attention mechanism and tracking them across multiple layers,” says Wu.

Their theoretical analysis suggested that causal masking gives the model-specific bias towards the onset of input, even when the bias is not present in the data.

If the previous words are relatively insignificant to the meaning of the sentence, causal masking can cause the trance to pay more attention to the first thing in the first place.

“It is often true that when LLM is used in tasks that are not natural language generation, such as rankings and information searches, the previous words and words after the sentence are more important, but these biases can be very harmful,” says Wu.

As the model grows, there is an additional layer of attention mechanism, which amplifies this bias as the previous portion of the input is used more frequently in the model inference process.

They also found that using position encoding to link words more strongly to nearby words can alleviate position bias. This technique refocuses the attention of the model in the appropriate location, but the effect can be diluted in models with more attention layers. These design choices are just one of the causes of position bias. It can arise from the training data used by the model. Learn how to prioritize words in sequences.

“If you know that your data is biased in a certain way, you need to fine-tune your models as well as adjusting your modeling options,” says Wu.

Lost in the middle

After they established theoretical framework, the researchers conducted experiments that systematically altered the correct position of the text sequence of information retrieval tasks.

The experiment showed a phenomenon “lost inside” and the accuracy of the search followed a U-shaped pattern. The model performed most when the correct answer was placed at the beginning of the sequence. If the correct answer approached the end, the performance fell shorter as it approached the centre before a slight rebound.

Ultimately, their research suggests that using different masking techniques to remove excess layers from attention mechanisms or strategically adopting position encoding reduces position bias and improves model accuracy.

“By combining theory and experiment, we were able to see the results of model design choices that were not clear at the time. If you want to use models in high-stakes applications, you need to know when, when, why, why, why, why, why, when, when it doesn’t work,” says Jadbabaie.

In the future, researchers would like to further explore the effects of location encoding and study how location bias is strategically exploited in certain applications.

“These researchers provide a rare theoretical lens to the attention mechanism at the heart of transformer models. They provide a compelling analysis that clarifies long-standing habits in transformer behavior, with attention mechanisms, particularly in causal masks, essentially showing bias models towards the beginning of the sequence. Saberi is professor and director of the Stanford University Center for Computational Market Design, who was not involved in this work.

Details: Xinyi Wu et al, appearance of position bias in Transformers, Arxiv (2025). doi:10.48550/arxiv.2502.01951

Journal Information: arxiv

Provided by Massachusetts Institute of Technology

This story has been republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and education.

Quote: Lost in the Center: AI Position Bias obtained from LLM Architecture and Training Data from https://techxplore.com/news/2025-06 (2025, June 17th)

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleHow the big names behind a kind bar diversify their wealth
Next Article Use IIOT data for manufacturing
Thefuturedatainsights
  • Website

Related Posts

Exploring the generalization of skills with extra robot arms for motor enhancement

June 26, 2025

New components reduce costs and supply chain constraints for fast-charged EV batteries

June 26, 2025

Can scholars write journal papers using AI? What the guidelines say

June 26, 2025
Leave A Reply Cancel Reply

Latest Posts

New Camera Tech transforms cattle care at athletics colleges

Environment Agency issues a water shortage warning for farmers

Farmer fined £40,000 after Potato Field Fluds Road and Polluted River

Farmers launch legal battle over changes to the inheritance of “family farm tax”

Latest Posts

NATO’s 5% spending target could be the peak for some defense stocks: City

June 26, 2025

Southwest Airport Lounge? According to the CEO, the careers are open to high-end changes

June 25, 2025

NATO allies agree to a high 5% defence spending target

June 25, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Elon Musk reportedly fires Tesla’s top sales executive
  • Nike Q4 2025 revenue
  • Israeli-Iran conflict exposed China’s “limited leverage,” analysts say | Israeli-Iran conflict news
  • Exploring the generalization of skills with extra robot arms for motor enhancement
  • Jeff Bezos reportedly courts Trump after his release with Musk

Recent Comments

No comments to show.

Welcome to USA Business Watch – your trusted source for real-time insights, in-depth analysis, and industry trends across the American and global business landscape.

At USABusinessWatch.com, we aim to inform decision-makers, professionals, entrepreneurs, and curious minds with credible news and expert commentary across key sectors that shape the economy and society.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Archives

  • June 2025
  • March 2022
  • January 2021

Categories

  • Aerospace & Defense
  • Agriculture
  • Automation & Process Control
  • Automotive & Transportation
  • Banking & Finance
  • Chemicals & Materials
  • Consumer Goods & Services
  • Economy
  • Economy
  • Electronics & Semiconductor
  • Energy & Resources
  • Food & Beverage
  • Hospitality & Tourism
  • Information Technology
  • Political
Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 usabusinesswatch. Designed by usabusinesswatch.

Type above and press Enter to search. Press Esc to cancel.