
In this representation of the adversarial threat detection framework, the vivid filaments carry incoming text and image icons to the central node, while a glowing, simple-piece facet topology shield deflects the dark, glitchy clump on the right. The configuration emphasizes the contrast between clean data flow and adversarial interference. Credit: Dall-E by Manish Batalai
The rapid advances and adoption of multimodal basic AI models have emerged new vulnerabilities, significantly increasing the possibility of cybersecurity attacks. Researchers at Los Alamos National Laboratory have proposed a new framework to identify hostile threats to foundation models. This task will allow system developers and security experts to better understand the vulnerabilities of the model and to enhance their resilience to more sophisticated attacks.
This research is published on the ARXIV preprint server.
“As multimodal models grow more commonly, enemies may exploit either text or visual channels, or at the same time, or even weaknesses,” says Manish Batalai, a computer scientist at Los Alamos.
“AI systems face threats from subtle, malicious operations that can mislead or corrupt output, and attacks can bring misleading or toxic content that appears to be the true output of the model.
Multimodal AI systems are great at embedding text and images in shared high-dimensional spaces and integrating a variety of data types to align the concept of images with text semantics (like the word “circle”). However, this alignment feature also introduces its own vulnerabilities.
These models are increasingly deployed in high-stakes applications, allowing enemies to exploit them via text or visual input. It also uses perturbations that can confuse organisation and lead to misleading or harmful consequences.
Multimodal systems defense strategies remain relatively untapped, despite the increasingly used in sensitive domains where these models are applied to complex national security topics and can contribute to modeling and simulation. Based on team experience, this new approach detects the signature and origin of adversarial attacks against today’s advanced artificial intelligence models by developing a purification strategy that neutralizes hostile noise in image-centric model attack scenarios.

Test power and average type I errors for adversarial detection methods with CIFAR10 embedding in CIFAR10 (last column). Credit: Arxiv (2025). doi:10.48550/arxiv.2501.18006
A new topology approach
The Los Alamos team’s solutions leverage topological data analysis, a mathematical discipline focusing on the “shape” of data, to uncover these adversarial signatures. When an attack breaks the geometric alignment of text and image embeddings, it produces measurable distortion. To accurately quantify the differences between these topologies, the researchers developed two pioneering techniques known as “topological control losses” to effectively identify the presence of hostile inputs.
“Our algorithms can accurately reveal attack signatures and, when combined with statistical methods, can detect inconspicuous malicious data with incredible accuracy. “This study demonstrates the potential for transformation of topology-based approaches to ensure next-generation AI systems, setting a strong foundation for future advancements in this field.”
The effectiveness of the framework was rigorously verified using Los Alamos’ Venado supercomputer. The machine’s chips installed in 2024 combine central processing units and graphics processing units to address high-performance computing and huge scale artificial intelligence applications. The team tested it against a wide spectrum of known adversarial attack methods across multiple benchmark datasets and models.
The results were clear. The topological approach consistently outperformed existing defenses, providing a more reliable and resilient shield to the threat.
The team presented a study called “An enemy topology signature in multimodal alignment” at an international conference on machine learning.
Details: Minh Vu et al, enemy topology signatures in multimodal alignment, Arxiv (2025). doi:10.48550/arxiv.2501.18006
Journal Information: arxiv
Provided by Los Alamos National Laboratory
Citation: Topology approach detects hostile attacks on multimodal AI systems (2025, 4 August) obtained from https://techxplore.com/news/2025-08 from August 4, 2025.
This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.