Scalable transaccelerators enable device execution for large language models

Large-scale language models (LLMs) like Bert and GPT are driving major advances in artificial intelligence, but their size and complexity usually require strong servers and cloud infrastructure. Running these models directly on devices left a challenging technical challenge, even relying on external computations.

The research team at Sejong University has developed a new hardware solution that will help change it. This work has been published in the journal Electronics.

Scalable Transaccelerator Units (STAUs) are designed to efficiently run a variety of transformer-based language models in embedded systems. It dynamically adapts to a variety of input sizes and model structures, making it particularly suitable for real-time on-device AI.

At the heart of the STAU is a variable systolic array (VSA) architecture that performs matrix operations (core workloads of transformer models) in an input sequence length and reduced manner. By feeding rows of input data rows and loading weights in parallel, the system reduces memory stalls and improves throughput. This is especially important for LLMS where statement lengths and token sequences vary widely between tasks.

In benchmark tests published in Electronics, the accelerator demonstrated a 3.45x faster speeds over CPU-only running, while maintaining numerical accuracy of over 97%. Additionally, processing long sequences reduced total calculation time by over 68%.

Since then, continuous optimization has further improved system performance. According to the team, recent internal testing has achieved speedups of up to 5.18 times, highlighting the long-term scalability of the architecture.

The researchers also redesigned SoftMax functions, an important part of the transpipeline. Because it typically relies on exponentialization and normalization, the bottleneck was redesigned using a lightweight RADIX-2 approach that relies on shift-and-address manipulation. This reduces hardware complexity without compromising the output quality.

To further simplify the calculations, the system uses a custom 16-bit floating point format that is tailored specifically to suit your trans workload. This format eliminates the need for layer normalization (another common performance bottleneck), and contributes to a more efficient and streamlined data path.

The Stau was implemented in a xilinx FPGA (VMK180) and controlled by an embedded ARM Cortex-R5 processor. This hybrid design allows developers to support a variety of transformer models, including those used in LLMS. No hardware changes are required.

The team sees it as steps to make advanced language models more accessible and deployable across a wider range of platforms, including real-time AI execution, privacy and low latency responses.

“The STAU architecture demonstrates that even the transformer model, and even the larger model, can be practical for applications on devices,” says author Seok-Woo Chang. “It provides the foundation for building scalable and efficient intelligent systems.”

More information: Seok-Woo Chang et al, Scalable Transaccelerator, Electronics (2024) with variable systolic arrays of multiple models of voice assistant applications. doi:10.3390/Electronics13234683

Provided by Sejong University

Citation: Scalable Transaccelerator enables devices to run on large language models (2025, July 21) from July 21, 2025, obtained from https://news/2025-07.

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.

Source link

What's Hot

Fund Managers conclude their position in Europe’s defense

Carbon “insertion” can support the energy transition of the marine and shipping industry

Studies found that 72% of our teens use AI peers

Carbon “insertion” can support the energy transition of the marine and shipping industry

The platform allows machine learning to be more transparent and accessible

3D printed steel capsules withstand nuclear reactor testing

Drug Mercedes driver crashed tractor into the house, court heard

NFU Cymru to PM: Don’t let tax reform destroy family farms

Video: 6 hectares of lost crop following the Hampshire Farm fire

Dairy sector wins as anti-farm ads pulled out of cinemas

Fund Managers conclude their position in Europe’s defense

10 Things to Do on the Right Path for Stocks as Another Tariff Deadline approaches

Why Delta and United are pulling away from airline packs

Archives

Categories

What's Hot

Scalable transaccelerators enable device execution for large language models

Related Posts

Subscribe to Updates