Recent advances in technology for hyper-realistic visual and audio effects
provoke the concern that deepfake videos of political speeches will soon be
indistinguishable from authentic video recordings. The conventional wisdom in
communication theory predicts people will fall for fake news more often when
the same version of a story is presented as a video versus text. We conduct 5
pre-registered randomized experiments with 2,215 participants to evaluate how
accurately humans distinguish real political speeches from fabrications across
base rates of misinformation, audio sources, question framings, and media
modalities. We find base rates of misinformation minimally influence
discernment and deepfakes with audio produced by the state-of-the-art
text-to-speech algorithms are harder to discern than the same deepfakes with
voice actor audio. Moreover across all experiments, we find audio and visual
information enables more accurate discernment than text alone: human
discernment relies more on how something is said, the audio-visual cues, than
what is said, the speech content.

Assessing the Threat of Deepfake Videos in the Era of Hyper-Realistic Visual and Audio Effects

The rapid advancements in technology have brought about a new era of hyper-realistic visual and audio effects, which has raised concerns about the potential misuse of this technology. Specifically, there are fears that deepfake videos, which use artificial intelligence to manipulate audio and visual content, could soon become indistinguishable from authentic recordings. This raises alarming implications, particularly in the realm of political speeches where misinformation can have significant consequences.

In order to understand the implications of this technology, researchers conducted five pre-registered randomized experiments involving 2,215 participants. The goal was to evaluate how accurately humans can differentiate between real political speeches and fabrications across various factors such as base rates of misinformation, audio sources, question framings, and media modalities.

One of the key findings of the study was that base rates of misinformation had minimal influence on people’s ability to discern between real and fake political speeches. This suggests that individuals’ prior exposure to misinformation does not significantly impact their ability to identify deepfake videos. This finding is particularly noteworthy as it challenges the conventional wisdom that people are more prone to falling for fake news when it is presented in video format rather than text.

Furthermore, the study discovered that deepfakes with audio generated by state-of-the-art text-to-speech algorithms are more difficult to discern than those with voice actor audio. This highlights the increasing sophistication of the AI algorithms used in generating deepfakes, making them even more challenging to detect. The multi-disciplinary nature of this study becomes apparent here, as it combines insights from computer science in developing these algorithms with expertise in communication theory to evaluate their impact on human perception.

Perhaps the most intriguing finding is the significant role that audio-visual cues play in discerning real from fake political speeches. The study reveals that humans rely more on how something is said and the audio-visual cues present, rather than solely focusing on the speech content. This finding underscores the importance of multi-modal approaches in understanding human perception and points to the need for interdisciplinary research that combines fields such as psychology, linguistics, and audio-visual processing.

The implications of this research are clear – as technology continues to advance, the threat of deepfake videos becomes more pressing. Society must be prepared to combat the spread of misinformation and disinformation campaigns that exploit hyper-realistic visual and audio effects. Adopting a multi-disciplinary approach that brings together experts from various fields is crucial in developing effective strategies to detect and debunk deepfakes. Combining insights from computer science, communication theory, psychology, linguistics, and audio-visual processing will be pivotal in safeguarding the integrity of political discourse and maintaining public trust.

Read the original article