Adversarial Robustness in Optimized LLMs: Defending Against Attacks

EasyChair Preprint 15857

6 pages•Date: February 21, 2025

Abstract

Adversarial robustness is a critical aspect of Large Language Models (LLMs), as these models are increasingly deployed in real-world applications where they may be vulnerable to adversarial attacks [1]. Optimization techniques such as quantization and pruning, while effective in reducing the computational and memory demands of LLMs, may inadvertently weaken their defences against adversarial manipulation [2][3]. This paper investigates the impact of common optimization strategies on the adversarial robustness of LLMs, exploring how model compression and parameter reduction can expose vulnerabilities to adversarial attacks, such as input perturbations and manipulation. We analyze existing methods that trade off model performance for computational efficiency, identifying potential risks in adversarial settings. In response, we propose novel optimization techniques that strike a balance between maintaining robustness and improving computational efficiency [4][5]. By integrating adversarial training with quantization and pruning, our approach strengthens model resilience without significant performance loss [10][14]. Empirical evaluations on benchmark datasets demonstrate the effectiveness of our methods, offering insights into how LLMs can be optimized while defending against adversarial threats, ensuring safer deployment in critical applications [13][15].

Keyphrases: Input Perturbations, LLM Security, Large Language Models (LLMs), Model Compression, Model resilience, Pruning, Quantization, adversarial attacks, adversarial robustness, adversarial training, computational efficiency, model optimization

Links:

https://easychair.org/publications/preprint/9r2v

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:15857,
  author    = {Joydeep Chandra and Prabal Manhas},
  title     = {Adversarial Robustness in Optimized LLMs: Defending Against Attacks},
  howpublished = {EasyChair Preprint 15857},
  year      = {EasyChair, 2025}}

Download PDF Open PDF in browser