Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
PEFT-as-an-Attack! Jailbreaking Language Models during Federated Parameter-Efficient Fine-Tuning
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.ORCID iD: 0000-0003-0145-3127
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.ORCID iD: 0000-0002-3454-8731
Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Electrical Engineering, Networked Embedded Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.ORCID iD: 0000-0002-2586-8573
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Federated Parameter-Efficient Fine-Tuning (FedPEFT) has emerged as a promising paradigm for privacy-preserving and efficient adaptation of Pre-trained Language Models (PLMs) in Federated Learning (FL) settings. It preserves data privacy by keeping the data decentralized and training the model on local devices, ensuring that raw data never leaves the user's device. Moreover, the integration of PEFT methods such as LoRA significantly reduces the number of trainable parameters compared to fine-tuning the entire model, thereby minimizing communication costs and computational overhead. Despite its potential, the security implications of FedPEFT remain underexplored. This paper introduces a novel security threat to FedPEFT, termed PEFT-as-an-Attack (PaaA), which exposes how PEFT methods can be exploited as an attack vector to circumvent PLMs' safety alignment and generate harmful content in response to malicious prompts. Our evaluation of PaaA reveals that with less than 1% of the model's parameters set as trainable, and a small subset of clients acting maliciously, the attack achieves an approximate 80% attack success rate using representative PEFT methods such as LoRA. To mitigate this threat, we further investigate potential defense strategies, including Robust Aggregation Schemes (RASs) and Post-PEFT Safety Alignment (PPSA). However, our empirical analysis highlights the limitations of these defenses, i.e., even the most advanced RASs, such as DnC and ClippedClustering, struggle to defend against PaaA in scenarios with highly heterogeneous data distributions. Similarly, while PPSA can reduce attack success rates to below 10%, it severely degrades the model's accuracy on the target task. Our results underscore the urgent need for more effective defense mechanisms that simultaneously ensure security and maintain the performance advantages of the FedPEFT paradigm.

National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-543432OAI: oai:DiVA.org:uu-543432DiVA, id: diva2:1914989
Available from: 2024-11-20 Created: 2024-11-20 Last updated: 2024-11-28
In thesis
1. Robust Federated Learning: Defending Against Byzantine and Jailbreak Attacks
Open this publication in new window or tab >>Robust Federated Learning: Defending Against Byzantine and Jailbreak Attacks
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Federated Learning (FL) has emerged as a promising paradigm for training collaborative machine learning models across multiple participants while preserving data privacy. It is particularly valuable in privacy-sensitive domains like healthcare and finance. Recently, FL has been explored to harness the power of pre-trained Foundation Models (FMs) for downstream task adaptation, enabling customization and personalization while maintaining data locality and privacy. However, FL's distributed nature makes it inherently vulnerable to adversarial attacks. Notable threats include Byzantine attacks, which inject malicious updates to degrade model performance, and jailbreak attacks, which exploit the fine-tuning process to undermine safety alignments of FMs, leading to harmful outputs. This dissertation centers on robust FL, aiming to mitigate these threats and ensure global models remain accurate and safe even under adversarial conditions. To mitigate Byzantine attacks, we propose several Robust Aggregation Schemes (RASs) that decrease the influence of malicious updates. Additionally, we introduce Blades, an open-source benchmarking tool to systematically study the interplay between attacks and defenses in FL, offering insights into the effects of data heterogeneity, differential privacy, and momentum on RAS robustness. Exploring the synergy between FL and FMs, we present a taxonomy of research along with adaptivity, efficiency, and trustworthiness. We uncover a novel attack, “PEFT-as-an-Attack” (PaaA), where malicious FL participants jailbreak FMs through Parameter-Efficient-Fine-Tuning (PEFT) with harmful data. We evaluate defenses against PaaA and highlight critical gaps, emphasizing the need for advanced strategies balancing safety and utility in FL-FM systems. In summary, this dissertation advances FL robustness by proposing novel defenses, tools, and insights while exposing emerging attack vectors. These contributions pave the way for attack-resilient distributed machine learning systems capable of withstanding both current and emerging threats.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2024. p. 54
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2477
Keywords
Federated learning, Jailbreak attack, Parameter-Efficient Fine-Tuning, Pre-trained Language Model, Robustness
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-540441 (URN)978-91-513-2312-1 (ISBN)
Public defence
2025-01-16, 101121, Sonja Lyttkens, Ångström, Regementsvägen 1, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2024-12-17 Created: 2024-11-20 Last updated: 2024-12-17

Open Access in DiVA

No full text in DiVA

Authority records

Li, ShenghuiNgai, EdithVoigt, Thiemo

Search in DiVA

By author/editor
Li, ShenghuiNgai, EdithVoigt, Thiemo
By organisation
Division of Computer SystemsNetworked Embedded SystemsComputer Architecture and Computer CommunicationComputer Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 376 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf