AI: The Thin Red Line of Information Integrity or Else?


11.jpg

 

Two years in a row (2022–2023) partners OntotextGATE and Identrics co-organized a local event, Technologies against Disinformation, to showcase tools helping with mis/disinformation spread analysis. In 2024 the already flourishing ecosystem of stakeholders invested in rehabilitating the information space deserved an agora of a larger and broader scale. This is how it evolved into the Sofia Information Integrity Forum (SIIF): six organizing partners with the active involvement of the Bulgarian-Romanian Observatory of Digital Media (BROD) consortium – GATE, Ontotext, CSD – along with Identrics, HSSF and Sensika; three days (7–9 November); five thematic tracks, and experts from all over the world but most importantly from the region of Southeastern Europe and the Black Sea.

    

Participants at SIIF 2024 discussed information integrity issues from a variety of perspectives: research, policy, media and information literacy & education, technology, and information operations. These five pillars build upon each other, getting us from the theoretical framework to tangible outcomes. As we can’t possibly summarize all the engaging lectures in a single post, let us focus on one curious line of discussion, namely the role of AI, which (or who?) was often mentioned or featured in the talks of SIIF presenters.

Us to AI: Hey AI, Can We Trust You?

Speakers across the five thematic tracks of the forum referenced AI in one way or another, outlining both its strengths and flaws. For disinformation researchers the likes of ChatGPT and Claude were noted as invaluable for streamlining information retrieval, while at the same time their potency to generate misleading/inaccurate statements at unprecedented speed and scale is seen as a serious threat. Observations shared by media literacy specialists, including our Romanian partners in BROD, reveal that students nowadays tend to use chatbots to look for information or research a particular topic instead of the “old-fashioned” browser search. On one hand, the AI assistant provides them with diverse perspectives, on the other hand, they should mind checking the sources of information to verify its truthfulness. Regulatory experts voiced recommendations on what is needed in terms of policy and practical measures to combat AI-generated disinformation: demand for transparency of content prioritization algorithms on platforms, strengthening regulatory frameworks like the EU Digital Services Act, educating users to recognize AI-manipulated content.

New Era of Media Intelligence

Studying media content, be it traditional online or social networks, is essential for deriving topics, trends, patterns, and a lot more insights from the data to facilitate mis/disinformation tracking. At a SIIF tech session, experts from media analytics company Commetric shared an interesting case study, where ML models did a lot of the hard work. The task was to examine the Yemeni media landscape to identify the primary stakeholders discussing the ongoing Houthi conflict. Commetric’s bespoke AI solutions helped them in transcribing audio data, identifying topics, and assessing sentiment towards the Houthis, and OpenAI’s GPT-4 was harnessed to perform stakeholder categorization.

With the rise of large language models (LLMs), media analysis has taken the leap to media intelligence, as analysts from A Data Pro argued in their talk. The AI can process massive volumes of data, detect similar recurring stories, untangle connections between actors. In the end, the verification and evaluation of the AI-generated information still lies upon the human analysts but the scale is way beyond what was possible before. Expanding on that, the CEO of Identrics, Nesin Veli, suggested that on a broader societal level LLMs have sparked great interest in technology and have widened personal information bubbles.

Going Synthetic and Persuasive   

Thе “widening” of the information space, on the other hand, makes potential manipulations even more far-reaching and impactful. That is why the people on the frontline of exposing misleading or false claims – fact-checkers, journalists, researchers – need to be equipped with the proper potent tools, and guess what: AI again comes handy here. In the session “Navigating the Infodemic” representatives of the vera.ai project consortium, Olga Papadopoulou (Centre of Research and Technology Hellas – CERTH) and Olesya Razuvayevskaya (University of Sheffield – USFD), elaborated on the innovative approaches they are developing to be of help to verification professionals.

Olga presented the work of all vera.ai partners, engaged in synthetic content detection in the visual (image and video) and audio modalities. In the past ten years the quality of images produced by generative AI (GenAI) models has improved so dramatically, Olga argued, that the clues are indistinguishable for the human eye and detection techniques have to be continuously enhanced and diversified to keep up with the advances. Separate state-of-the-art models are specifically trained to detect local synthetic manipulations or fully synthetic images. In the local manipulations task, researchers have come up with a novel method to mitigate the high level of false positives through noise fingerprint extraction and confidence estimation on a pixel level. For fully synthetic images a dedicated AI detector has been developed for each GenAI approach, as every generative model leaves its own peculiar traces in the frequency domain. The deepfake video and synthetic audio detection methods have also yielded promising results, though challenges remain with low-quality data and new generative architectures arising that are not present in the training data.  

Olesya’s team at USFD focuses on textual analysis, particularly on assessing the credibility of textual content in multiple languages through the automatic computation of the so-called credibility signals. These signals can be related to the specific choice of words, writing style, perspective of presenting information, persuasive techniques, among others. The combination of all of them might or might not hint to a misleading/manipulative piece of text – it is left upon the human to make the final judgment.

Strengthen Security Monitoring

For the people responsible for state security and public order it is instrumental that technological tools can process and notify about concerning messaging in the information environment in a timely manner. In his talk Andriy Kusyy from LetsData, a company that detects early signals of information operations orchestrated by malign actors, highlighted that GenAI has been a gamechanger in this aspect. In the pre-GenAI era the information and metadata extracted from textual content, particularly on social media platforms, was insufficient and the training of specific models took weeks, while now powerful LLMs have reduced this whole process to a couple of hours. Dmytro Bilash from Osavul, which provides information environment assessment services, also pinpointed AI as an enabler to take narrative detection and impact analysis to the next level.

Vassil Velichkov, CTO of media analytics company Sensika, exemplified through the Southport stabbings case how their platform leverages multilingual LLMs to expose and monitor disinformation narrative formation and amplification. As malicious actors are geared with fast and cheap content production with the help of GenAI, the stakeholders countering their actions should also tool up accordingly, Vassil argued.  

Dr. George Sharkov, CEO of the European Software Institute and Lead of the CyberLab at Sofia Tech Park, elaborated more on the operationalization side. He presented insights from simulated cybersecurity attack exercises, where AI was used to generate the mocked content to train strategic communications specialists and journalists how to tackle the situation.

AI to Us: You Can Trust Me Because…

The advanced AI models appear as indeed potent to up the scale of processing and deriving insights from information but the question is still lingering: to what extent can we trust their output? If AI tries to play on us the “because you can” argument (“because” is very convincing, as experiments have proved), this is not enough. In other words, we need to ground the knowledge implicitly used by LLMs to a trustworthy source.

One such interesting example is a chatbot presented by Ontotext’s Research Lead for Counter-Disinformation Projects, Andrey Tagarev. To facilitate the work of verification professionals and disinformation researchers within the scope of EC-funded projects vera.ai and BROD, Ontotext is developing a Database of Known Fakes (DBKF) – a database that gathers trustworthy fact-checks, extracts meaningful metadata and other useful information through AI methods, and enables advanced searches over the enriched content. All in all, DBKF is a sophisticated system with various functionalities to explore the data at different levels and depth. The chatbot appears as a helpful assistant for the human user, who wants a quick and easy result without needing to think how to best combine the available features. Most importantly, the output is reliable because it is based on the fact-checked content in the database.

Garvan Walshe, Founder of Article7 – Intelligence for Democrats, presented another relevant solution. His company harnesses the power of AI to protect the public debate in democratic societies. The Quotebank solution they are developing gathers quotes from politicians, derived from their real-life statements in parliament or elsewhere, and enables AI-powered semantic search over these, i.e. by meaning not just by keyword. The search result is thus authentic because it is based on a repository of official quotes.

This Is (Not) The End

AI is already reshaping the way information is produced, consumed, and analyzed. In this sense, it could act both as a threat to information integrity when in the wrong hands or as an enabler for those committed to preventing the harm of disinformation spread. Grounding LLMs against reliable knowledge bases as well as filling in the gaps in terms of policy and education will hugely benefit all stakeholders that we heard from at SIIF and who continue their efforts towards preserving information space wholeness. 

 

BROD