Researchers have recently introduced a new technique called Federated Parameter-Efficient Fine-Tuning (FedPEFT) that combines parameter-efficient fine-tuning (PEFT) with federated learning (FL) to enhance the efficiency and privacy of training large language models (PLMs) for specific tasks. While this approach shows promise, it also brings along a new security risk dubbed “PEFT-as-an-Attack” (PaaA), where malicious actors can exploit PEFT to bypass the safety alignment of PLMs and generate harmful content.
In response to this emerging threat, researchers have conducted extensive studies on the effectiveness of PaaA against various PEFT methods and explored potential defense mechanisms such as Robust Aggregation Schemes (RASs) and Post-PEFT Safety Alignment (PPSA). The findings revealed that when faced with diverse data distributions, RASs may not be as effective in combating PaaA. Although PPSA shows promise in mitigating PaaA, it comes at the cost of reducing the model’s accuracy, underscoring the urgent need for new defense mechanisms that can strike a balance between security and performance in FedPEFT systems.
To further delve into this issue, researchers have developed a FedPEFT system tailored for fine-tuning PLMs using decentralized, domain-specific datasets. This system faces the inherent risk of PaaA, where malevolent clients inject toxic training data to compromise the safety guardrails of PLMs. In an effort to counteract this threat, potential defense mechanisms such as RASs and PPSA have been proposed to mitigate the impact of malicious updates and restore the model’s adherence to safety constraints.
The research team conducted a series of experiments involving four PLMs and three PEFT methods on two domain-specific QA datasets. These experiments simulated scenarios where malicious clients injected harmful data to compromise the safety of the models. The impact of malicious clients on model safety and utility was measured through parameters like attack success rate and task accuracy, utilizing the Blades benchmark suite to simulate the FedPEFT system and leveraging the Hugging Face ecosystem for training and evaluation.
In their experimental evaluation, the researchers focused on adapting PLMs for medical question answering using FedPEFT methods, with LoRA emerging as the top performer in terms of accuracy but also displaying vulnerability to PaaA. While RASs were found to be ineffective in safeguarding against PaaA, especially in non-IID settings, PPSA effectively mitigated the effects of PaaA albeit at the expense of reduced performance in downstream tasks.
The study sheds light on the daunting security threat posed by PaaA in FedPEFT systems and emphasizes the limitations of existing defense mechanisms like RASs and PPSA in countering this attack. To address these challenges, the researchers propose exploring advanced PPSA techniques and integrating safety alignment directly into the fine-tuning process to proactively address emerging vulnerabilities while preserving model performance.
In conclusion, the research underscores the critical need for robust and efficient defense mechanisms to combat PaaA in FedPEFT systems and sets the stage for future research endeavors in this domain. The findings of this study not only highlight the evolving landscape of security risks in machine learning environments but also pave the way for innovative solutions to safeguard sensitive data and reinforce the integrity of PLMs in the face of malicious threats.
