
Credit: Pixabay/CC0 Public Domain
When writing something to someone else by email, or perhaps on social media, we cannot state things directly, but our words may instead convey potential meanings. I also hope that this meaning will reach readers.
But what happens when an AI system is on the other side rather than on the person? Can AI, particularly conversational AI, understand the potential meanings of texts? If so, what does this mean to us?
Potential content analysis is an area of research that relates to revealing deeper meanings, emotions, and subtleties embedded in texts. For example, this type of analysis can help us grasp the political trends that exist in communication that are likely not clear to everyone.
Understanding how intense or cynical someone’s emotions are is important in supporting a person’s mental health, improving customer service, and even keeping people safe at the national level.
These are just a few examples. You can imagine the benefits in other areas of life, such as social science research, policy making, business. Given how important these tasks are and how quickly conversational AI is improving, it is essential to explore what these technologies can do in this regard.
Work on this issue has just begun. The current work shows that ChatGpt is limited to detecting political trends on news websites. Another study focusing on differences in irony detection between different large-scale language models (technology behind AI chatbots such as ChatGpt) has shown that some are better than others.
Finally, studies have shown that LLMS can infer the emotional “valence” of words, that is, the inherent positive or negative “emotions” associated with them. Our new study published in Scientific Reports tested whether conversational AI, including GPT-4, a relatively recent version of CHATGPT, could be read between lines of human-written text.
The goal was to examine how LLMS simulates an understanding of emotions, political tendencies, emotional strength, and irony. In this study, we evaluated the reliability, consistency, and quality of seven LLMs, including GPT-4, GEMINI, LLAMA-3.1-70B and MIXTRAL 8×7B.
These LLMs have been found to be almost as good as humans when analyzing detections of emotions, political tendencies, emotional strength and irony. This study involved 33 human subjects and assessed 100 curated text items.
To find political tendencies, GPT-4 was more consistent than humans. It is important in areas such as journalism, political science, or public health, where inconsistent judgments can distort discoveries and patterns.
GPT-4 has also proven capable of picking up emotional strength, especially valence. Whether the tweet was composed by someone who was mildly irritated or deeply enraged, AI could know. However, we had to check whether the AI was correct in that assessment. This is because AI tends to downplay emotions. Irony remained a block of stumbling blocks, both human and machine.
This study found no clear winners there. Therefore, using human evaluators is not very useful for detecting irony.
Why is this important? For one, AI like GPT-4 can dramatically reduce the time and cost of analyzing large amounts of online content. Social scientists often analyze user-generated text to detect trends. Meanwhile, GPT-4 opens the door to faster, more sensitive research, which is particularly important in a crisis, election, or public health emergency.
Journalists and fact-directors may also benefit. The GPT-4-powered tool helps flag emotionally charged or politically oblique posts in real time, giving you a head start for the newsroom.
There are still concerns. AI transparency, fairness and political trends remain problematic. However, such research suggests that machines are fast catching up to us when it comes to understanding language.
This work does not argue that conversational AI can completely replace human evaluators, but it challenges the idea that machines are hopeless at detecting nuances.
The findings of our study raise follow-up questions. If a user asks the same AI question in multiple ways, is the underlying judgment and evaluation of the model consistent, perhaps by subtly rephrasing the prompt, changing the order of the information, or adjusting the amount of context provided?
Further research should include a systematic and rigorous analysis of how stable the model’s output is. Ultimately, understanding and improving consistency is essential for large-scale deployment of LLMS, especially in high-stakes settings.
Provided by conversation
This article will be republished from the conversation under a Creative Commons license. Please read the original article.
Quote: AI may be as good as humans at detecting emotions, political tendencies and irony in online conversations obtained from July 3, 2025 from https://techxplore.com/news/2025-07-ai-good-humans-emotion-political.htmlltml.
This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.
