
Credit: Unsplash/CC0 Public Domain
Researchers have developed methods that significantly improve the performance of large-scale language models without increasing the computational power required to fine-tune the model. Researchers have demonstrated that their methods improve the performance of these models with previous techniques of tasks such as commonsense inference, arithmetic inference, post-directional instructions, code generation, and visual recognition.
Large language models are artificial intelligence systems that are pre-protected with huge datasets. After trading in advance, these models predict which words must follow each other to respond to user queries. However, the non-specific nature of pre-training means that there is ample room for improvement in these models when user queries are focused on a particular topic, such as when users answer mathematical questions or write computer code in the model.
“To improve the model’s ability to perform more specific tasks, we need to fine-tune the model,” said Tianfu Wu, co-author of a paper on work at North Carolina State University and an associate professor of computer engineering.
“However, these models are so large that it is not feasible to retrain the entire model. Instead, we need to determine the minimum number of changes needed to improve the performance of the model. We have developed a technique called Wegeft (pronounced Wee-Gift), which represents a key advancement to fine-tune these large models.”
A major breakthrough to fine-tune these large models was called the Lora, which was announced in 2022. LORA works by using mathematical tools to identify small subsets of important parameters that are most likely to improve model performance at a particular task.
There have been many attempts to improve Lora, but Wu and his collaborators have discovered that these previous efforts require significantly more computing power to improve performance, or that they need to use the same amount of computing power without improving performance.
“Wegeft is based on LORA, but incorporates additional mathematical tools that allow the model to determine the key parameters it is already familiar with and the parameters that the model “needs to learn,” says Wu. “By placing more weight on truly novel parameters, we can improve model performance compared to LORA without incorporating important new computational demands.”
In the proof-of-concept test, researchers found that Wegeft ran many variations and more than that, spanning a variety of downstream tasks, including many variations.
“I think this is a valuable step forward,” Wu says. “Wegeft can also be used to identify elements of the model that are causing harmful output, with the aim of improving AI alignment and ‘surgery’ to improve model safety and power.
This paper, “Wegeft: Fine-tuning the weight generation for multifaceted and efficient adaptation of large-scale models,” will be presented at the International Machine Learning Conference in Vancouver, Canada on July 17th. The paper’s co-author is PhD Chinmay Savadikar. Students from NC. This paper was co-authored by independent researcher Xi Song.
Details: “Wegeft: Fine-tuning of Weight Generation for Multi-Face and Efficient Adaptation of Large-Scale Models,” Chinmay Savadikar and Tianfu Wu, North Carolina State University. Xi Song, an independent researcher. Presentation: International Conference on Machine Learning in Vancouver, Canada, July 13th-19th. icml.cc/virtual/2025/poster/45660
Provided by North Carolina State University
Quote: The Approach will improve how new skills can be taught in a large-scale language model (2025, July 7) obtained from https://techxplore.com/news/2025-07-Approach-skills-taught–large-language.html on July 7, 2025.
This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.
