Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving

arXiv:2501.13563v1 Announce Type: new Abstract: Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by enhancing reasoning capabilities; however, these models remain highly susceptible to adversarial attacks. While existing research has explored white-box attacks to some extent, the more practical and challenging black-box scenarios remain largely underexplored due to their inherent difficulty. In this paper, we take the first step toward designing black-box adversarial attacks specifically targeting VLMs in AD. We identify two key challenges for achieving effective black-box attacks in this context: the effectiveness across driving reasoning chains in AD systems and the dynamic nature of driving scenarios. To address this, we propose Cascading Adversarial Disruption (CAD). It first introduces Decision Chain Disruption, which targets low-level reasoning breakdown by generating and injecting deceptive semantics, ensuring the perturbations remain effective across the entire decision-making chain. Building on this, we present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios that are likely to result in critical errors in the current driving contexts. Extensive experiments conducted on multiple AD VLMs and benchmarks demonstrate that CAD achieves state-of-the-art attack effectiveness, significantly outperforming existing methods (+13.43% on average). Moreover, we validate its practical applicability through real-world attacks on AD vehicles powered by VLMs, where the route completion rate drops by 61.11% and the vehicle crashes directly into the obstacle vehicle with adversarial patches. Finally, we release CADA dataset, comprising 18,808 adversarial visual-question-answer pairs, to facilitate further evaluation and research in this critical domain. Our codes and dataset will be available after paper’s acceptance.
The article “Vision-language models (VLMs) in Autonomous Driving: Designing Black-Box Adversarial Attacks” explores the vulnerability of VLMs in autonomous driving systems to adversarial attacks. While previous research has focused on white-box attacks, this paper takes a pioneering step towards understanding and designing black-box attacks specifically targeting VLMs in autonomous driving. The authors identify two key challenges: the effectiveness of attacks across the decision-making chain in autonomous driving systems and the dynamic nature of driving scenarios. To address these challenges, they propose a novel approach called Cascading Adversarial Disruption (CAD). CAD includes Decision Chain Disruption, which targets low-level reasoning breakdown, and Risky Scene Induction, which addresses dynamic adaptation. Extensive experiments demonstrate that CAD achieves state-of-the-art attack effectiveness, outperforming existing methods. Real-world attacks on autonomous vehicles powered by VLMs validate the practical applicability of CAD. The authors also release a dataset, CADA, comprising adversarial visual-question-answer pairs, to facilitate further evaluation and research in this critical domain.

The Power and Vulnerability of Vision-Language Models in Autonomous Driving

Introduction

The development of vision-language models (VLMs) has created significant advancements in autonomous driving (AD) technology. These models have greatly enhanced the reasoning capabilities of AD systems, enabling them to better understand and interpret the visual environment. However, despite their strengths, VLMs are still highly susceptible to adversarial attacks, which can undermine their effectiveness and compromise the safety of AD vehicles.

The Challenge of Black-Box Adversarial Attacks

While previous research has primarily focused on white-box attacks, which exploit the knowledge of the model’s architecture and parameters, the more practical and challenging black-box scenarios have been largely underexplored. Black-box attacks involve minimal knowledge of the targeted model and replicate real-life situations where attackers have limited access to the internal workings of the system. This inherent difficulty makes them a critical area of study in the field of AD and VLMs.

Addressing Challenges through Cascading Adversarial Disruption (CAD)

To tackle the challenges associated with black-box adversarial attacks in the context of VLMs in AD, we propose a novel approach called Cascading Adversarial Disruption (CAD). CAD aims to disrupt the decision-making process of VLMs by leveraging two key techniques:

Decision Chain Disruption: CAD introduces Decision Chain Disruption to target low-level reasoning breakdown in AD systems. It generates and injects deceptive semantics into the input data, ensuring that the perturbations remain effective throughout the entire decision-making chain. By manipulating the semantic information, CAD can misguide the VLMs and force them to make incorrect decisions, leading to potentially dangerous outcomes.
Risky Scene Induction: CAD addresses the dynamic nature of driving scenarios by using a surrogate VLM to understand and construct high-level risky scenarios. By analyzing the current driving context, CAD can identify situations that are likely to result in critical errors and induce them intentionally. This technique allows CAD to adapt to changing environments and exploit vulnerabilities in the AD system’s decision-making process.

Results and Practical Applicability

Extensive experiments conducted on multiple AD VLMs and benchmarks demonstrate that CAD achieves state-of-the-art attack effectiveness. It significantly outperforms existing methods, with an average improvement of 13.43%. Moreover, CAD’s practical applicability has been validated through real-world attacks on AD vehicles powered by VLMs. These attacks resulted in a drastic drop in the route completion rate (61.11%) and direct crashes into obstacle vehicles using adversarial patches.

Contributions and Future Research

To further advance research in this critical domain, we have released the CADA dataset, comprising 18,808 adversarial visual-question-answer pairs. This dataset aims to facilitate the evaluation and development of robust defense mechanisms against black-box adversarial attacks in VLMs used in AD. Additionally, the codes and dataset associated with CAD will be made available to the research community after the acceptance of this paper, enabling further exploration and innovation in this vital field.

“The power and vulnerability of vision-language models in autonomous driving cannot be underestimated. While these models have significantly enhanced reasoning capabilities, they are also highly susceptible to adversarial attacks. Our proposed Cascading Adversarial Disruption approach addresses the challenges of black-box attacks in VLMs, achieving state-of-the-art results and exposing vulnerabilities in real-world AD systems.”

The paper titled “Cascading Adversarial Disruption: Black-Box Attacks on Vision-Language Models in Autonomous Driving” addresses the problem of adversarial attacks on vision-language models (VLMs) in the context of autonomous driving (AD). VLMs have played a crucial role in enhancing the reasoning capabilities of AD systems, but they are also vulnerable to adversarial attacks, which can have serious consequences in real-world driving scenarios.

The researchers acknowledge that while some previous research has focused on white-box attacks, where the attacker has full knowledge of the model, black-box attacks, where the attacker has limited knowledge of the model, are more challenging and have not been extensively explored in this domain. To address this gap, the authors propose a novel approach called Cascading Adversarial Disruption (CAD) that specifically targets VLMs in AD.

CAD consists of two main components: Decision Chain Disruption and Risky Scene Induction. Decision Chain Disruption aims to disrupt the low-level reasoning process of the AD system by generating and injecting deceptive semantics. By ensuring that the perturbations remain effective across the entire decision-making chain, this component aims to cause breakdowns in the reasoning process and lead to incorrect decisions.

The second component, Risky Scene Induction, tackles the dynamic nature of driving scenarios. It leverages a surrogate VLM to understand and construct high-level risky scenarios that are likely to result in critical errors in the current driving context. By inducing these risky scenes, the attacker can increase the likelihood of the AD system making incorrect decisions.

Extensive experiments conducted on multiple AD VLMs and benchmarks demonstrate that CAD outperforms existing methods, achieving a state-of-the-art attack effectiveness with an average improvement of 13.43%. Furthermore, the researchers validate the practical applicability of CAD through real-world attacks on AD vehicles powered by VLMs. These attacks result in a significant drop in the route completion rate (61.11%) and direct crashes into obstacle vehicles with adversarial patches.

To promote further evaluation and research in this critical domain, the researchers release the CADA dataset, which contains 18,808 adversarial visual-question-answer pairs. This dataset will be a valuable resource for researchers to study and develop robust defenses against adversarial attacks on VLMs in AD.

Overall, this paper presents a significant contribution to the field of autonomous driving and adversarial machine learning. By addressing the challenges of black-box attacks on VLMs, the proposed CAD method demonstrates its efficacy in disrupting the decision-making process of AD systems. The real-world attack experiments highlight the potential dangers of such attacks and the need for robust defenses in autonomous driving. The release of the CADA dataset will facilitate further research and evaluation in this critical domain.
Read the original article