Download PDFOpen PDF in browserAdversarial Robustness in Optimized LLMs: Defending Against AttacksEasyChair Preprint 158576 pages•Date: February 21, 2025AbstractAdversarial robustness is a critical aspect of Large Language Models (LLMs), as these models are increasingly deployed in real-world applications where they may be vulnerable to adversarial attacks [1]. Optimization techniques such as quantization and pruning, while effective in reducing the computational and memory demands of LLMs, may inadvertently weaken their defences against adversarial manipulation [2][3]. This paper investigates the impact of common optimization strategies on the adversarial robustness of LLMs, exploring how model compression and parameter reduction can expose vulnerabilities to adversarial attacks, such as input perturbations and manipulation. We analyze existing methods that trade off model performance for computational efficiency, identifying potential risks in adversarial settings. In response, we propose novel optimization techniques that strike a balance between maintaining robustness and improving computational efficiency [4][5]. By integrating adversarial training with quantization and pruning, our approach strengthens model resilience without significant performance loss [10][14]. Empirical evaluations on benchmark datasets demonstrate the effectiveness of our methods, offering insights into how LLMs can be optimized while defending against adversarial threats, ensuring safer deployment in critical applications [13][15]. Keyphrases: Input Perturbations, LLM Security, Large Language Models (LLMs), Model Compression, Model resilience, Pruning, Quantization, adversarial attacks, adversarial robustness, adversarial training, computational efficiency, model optimization
|