Switchable Backdoor Attack on Pretrained Models

admin

2 years ago

Switchable Backdoor Attack on Pretrained Models

In the era of big data, the practice of pre-training large vision transformer (ViT) models on extensive datasets has become common to improve performance on subsequent tasks. Visual prompting (VP), a technique that involves introducing task-specific parameters while keeping the pre-trained backbone frozen, provides an efficient alternative to full fine-tuning.

Despite the benefits of VP, the potential security risks associated with this approach have not been thoroughly explored. Recently, a group of cybersecurity analysts from Tsinghua University, Tencent Security Platform Department, Zhejiang University, Research Center of Artificial Intelligence, Peng Cheng Laboratory shed light on a new threat posed by backdoor attacks in the context of using VP in a cloud service scenario. The researchers involved in this discovery include Sheng Yang, Jiawang Bai, Kuofeng Gao, and Yong Yang.

The team of researchers introduced a novel backdoor attack threat called Switchable Backdoor Attack (SWARM), which aims to optimize a trigger, clean prompts, and a switch token through a combination of clean loss, backdoor loss, and cross-mode feature distillation. This method ensures normal behavior of the model without the switch token while causing misclassification of the target when the backdoor is activated.

Experiments conducted across various visual tasks demonstrated that SWARM has a high attack success rate and is difficult to detect. In this scenario, a malicious cloud service provider acts as the threat actor, which is a common theme in existing backdoor attack scenarios. Users submit their task datasets and pre-trained models to the service provided by the threat actor and utilize the trained API to identify and mitigate backdoors.

The threat actor manipulates prompt inputs while the users do not have control over the samples. In normal mode, the model should be able to handle triggered patterns without detection. However, in backdoor mode, the attack should have a high success rate, aiming to conceal triggers by accurately predicting clean samples and misclassifying them when the switch token is activated.

Visual prompting involves adding learnable prompt tokens after the embedding layer to adjust task-specific parameters during training. To address the risks associated with this type of attack, users may employ techniques such as Neural Attention Distillation (NAD) and I-BAU. However, experiments conducted by the researchers show that SWARM outperforms these mitigation techniques, achieving a 96% ASR against NAD and over 97% against I-BAU.

The researchers propose a new form of backdoor attack that targets pre-trained vision transformers with visual prompts by inserting an extra switch token to facilitate transitions between clean and backdoored modes. SWARM represents a significant advancement in the realm of attack mechanisms while also providing insights for future defense research.

Overall, the findings of this study highlight the importance of understanding the potential security risks associated with visual prompting in the context of pre-trained models and emphasize the need for robust defense mechanisms to mitigate such threats.

Source link