Skip to main content

COMPEL Glossary / adversarial-attack

Adversarial Attack

An adversarial attack is a deliberate attempt to fool or manipulate an AI system by providing specially crafted inputs designed to cause incorrect outputs.

What this means in practice

For example, adding imperceptible perturbations to an image can cause a computer vision system to misclassify it, or subtly modifying text can bypass content moderation filters. Adversarial attacks expose vulnerabilities in AI models that standard testing may not detect, making them a significant concern for AI systems deployed in security-critical, financial, or safety applications. Defense strategies include adversarial training (exposing models to attack examples during training), input validation, ensemble methods, and certified robustness testing. In the COMPEL risk taxonomy, adversarial attacks are assessed as part of AI security risk in Domain 13.

Why it matters

Adversarial attacks expose vulnerabilities in AI models that standard testing cannot detect, making them a significant concern for organizations deploying AI in security-critical, financial, or safety applications. As AI systems become more integrated into core business processes, the potential impact of successful adversarial manipulation grows proportionally. Organizations that do not test for adversarial resilience carry hidden risk that may surface at the worst possible moment.

How COMPEL uses it

In the COMPEL risk taxonomy, adversarial attacks are assessed as part of AI security risk in Domain 13 of the Technology pillar during Calibrate. Defense strategies including adversarial training and certified robustness testing are designed during the Model stage and implemented during Produce. The Evaluate stage includes adversarial testing as part of the security validation required for high-risk AI system deployment.

Related Terms

Other glossary terms mentioned in this entry's definition and context.