Universal and Transferable Adversarial Attacks on Aligned Language Models
-
arxiv.org
Clear