Ethics Adversarial Attacks On AI

Securing the Black Box:
Adversarial Attacks

Machine learning models are mathematical optimizers. They do not "see" a stop sign; they compute pixel matrices. Adversarial attacks exploit the massive dimensionality of these matrices, tricking models into devastating misclassifications with changes completely invisible to humans.

Evasion Attacks (The FGSM Algorithm)

Evasion attacks happen during deployment (inference). The attacker feeds manipulated data into an already trained model to force a wrong prediction. The most famous technique is the Fast Gradient Sign Method (FGSM).

FGSM utilizes the gradients of the neural network to create adversarial examples. For an input image $x$ and a target label $y$, the adversarial image $x'$ is computed as:

$$x' = x + \epsilon \cdot \text&123;sign&125;(\nabla_x J(\theta, x, y))$$

Here, $\epsilon$ (epsilon) is the attack strength, $J$ is the loss function, and $\theta$ are the model weights. The noise is pushed in the exact direction that maximizes the error.

Data Poisoning (Training Phase)

If an attacker can alter the training dataset before the model is even built, they execute a poisoning attack. By injecting maliciously labeled data or inserting "backdoor triggers" (like a small yellow square on images), the model learns a corrupted logic.

Later, in the real world, the model operates normally—until it sees the backdoor trigger, which causes it to execute the attacker's embedded payload.

📡 Intelligence Briefing (FAQ)

What is the difference between Evasion and Poisoning attacks?

Evasion occurs during inference (after training). The attacker modifies the input data (e.g., placing stickers on a stop sign). Poisoning occurs during training. The attacker alters the dataset itself to embed backdoors or ruin model accuracy before deployment.

How does Adversarial Training defend against attacks?

Adversarial Training is a brute-force defense strategy. The defender generates adversarial examples (like adding FGSM noise to images) and includes them in the training dataset alongside their correct labels. This forces the model to learn robust features rather than relying on brittle pixel patterns.

What does Epsilon ($\epsilon$) do in FGSM?

Epsilon is the perturbation multiplier. It dictates how much noise is added to the image. A very high epsilon will break the model easily, but the attack will be obvious to a human. A low epsilon creates an attack that humans cannot perceive, but is harder to trick the model with.

Adversarial
Attacks

Threat Matrix

Phase 1: Evasion

System Audit

Security Sandboxes

Securing the Black Box:
Adversarial Attacks

Evasion Attacks (The FGSM Algorithm)

Data Poisoning (Training Phase)

📡 Intelligence Briefing (FAQ)

Threat Glossary