ML 140: Deep Neural Rejection (45 pts extra)

What You Need

Purpose

To practice training and evaluating DNR (Deep Neural Rejection), a reject-based defense against evasion attacks. DNR rejects images that look significantly different from most of the training images, which most evading images do.

This project is based on this tutorial:

Deep Neural Rejection

Using Google Colab

In a browser, go to
https://colab.research.google.com/
If you see a blue "Sign In" button at the top right, click it and log into a Google account.

From the menu, click File, "New notebook".

Understanding DNR

DNR analyzes the representations of input samples at different network layers, and rejects samples which exhibit anomalous behavior with respect to that observed from the training data at such layers.

The figure below represents this process. Note these features:

1. Downloading the Data

Execute these commands to import the SecML library and download the MNIST dataset, which contains 70,000 small images of handwritten digits.

They also download a pre-trained model, which we call "dnn" for Deep Neural Network.

!pip install git+https://github.com/pralab/secml
import secml
import torch

from secml.array import CArray
from secml.data.loader import CDataLoaderMNIST
from secml.data.selection import CPSRandom
from secml.data.splitter import CDataSplitterShuffle

# load MNIST training set and divide it in two parts
tr_data = CDataLoaderMNIST().load(ds='training')
tr_data.X /= 255.0
splitter = CDataSplitterShuffle(num_folds=1, train_size=0.5,
                                test_size=0.5, random_state=1)
splitter.compute_indices(tr_data)

# dnr training set, reduced to 5000 random samples
tr_set = tr_data[splitter.tr_idx[0], :]
tr_set = CPSRandom().select(dataset=tr_set, n_prototypes=5000, random_state=0)

# load test set
ts_set = CDataLoaderMNIST().load(ds='testing', num_samples=1000)
ts_set.X /= 255.0

from secml.model_zoo import load_model
# load from model zoo the pre-trained net
dnn = load_model("mnist-cnn")

print( "Training Set Shaoes:", ts_set.X.shape, ts_set.Y.shape )
As shown below, a few pages of messages scroll by, ending with several "File stored" messages.

The last line shows the "shape" of the test set:

2. Viewing the Images

Execute these commands to view the first 20 images:
from secml.figure import CFigure
# Only required for visualization in notebooks
%matplotlib inline

# Let's define a convenience function to easily plot the MNIST dataset
def show_digits_1(samples, labels, n_display):
    samples = samples.atleast_2d()
    n_display = min(n_display, samples.shape[0])
    fig = CFigure(width=n_display*2, height=3)
    for idx in range(n_display):
        fig.subplot(2, n_display, idx+1)
        fig.sp.xticks([])
        fig.sp.yticks([])
        fig.sp.imshow(samples[idx, :].reshape((28, 28)), cmap='gray')
        fig.sp.title("{}".format(labels[idx].item()))
    fig.show()

show_digits_1(ts_set.X, ts_set.Y, 5)
show_digits_1(ts_set[5:10, :].X, ts_set[5:10, :].Y, 5)
show_digits_1(ts_set[10:15, :].X, ts_set[10:15, :].Y, 5)
show_digits_1(ts_set[15:20, :].X, ts_set[15:20, :].Y, 5)
As shown below, the images are handwritten digits. The labels above the images show the correct classification.

3. Viewing Predictions of the Unprotected "DNN" Model

Let's see how well the "DNN" model classifies these images.

Execute this code:

print("Image, Prediction, Correct: \tScores")
print("", end = "\t")
for i in range(10):
  print(" ", i, end="\t")
print()
for iter in range(20):
  prediction, scores = dnn.predict(ts_set[iter, :].X, True)
  p = prediction.get_data()[0]
  y = ts_set[iter, :].Y.get_data()[0]
  print(iter, p, y, end = ":\t")
  for s in scores:
    print(int(100 * s), end = "\t")
  print()
The first three numbers in each row show the image number, the predicted classification, and the correct classification. The rest of the numbers show the strength of the output signal for each of the ten possible classification categories.

If the second and third numbers match, the model correctly classifies the image. As shown below, most of the images are correctly classified.

However, image #8 is really a 5, but is incorrectly classified as a 6 by the DNN model.

Also, image #18 is really a 3, but is incorrectly classified as a 5 by the DNN model.

4. Attacking the Unprotected "DNN" Model

Execute these commands to create evasion images from the first 20 images:
from secml.adv.attacks import CAttackEvasionPGDExp

solver_params = {'eta': 1e-1, 'eta_min': 1e-1, 'max_iter': 40, 'eps': 1e-8}

pgd_exp = CAttackEvasionPGDExp(classifier=dnn, double_init_ds=tr_set, dmax=2,
                               distance='l2', solver_params=solver_params)

print("Running attack...")
eva_y_pred, _, eva_adv_ds, _ = pgd_exp.run(x=ts_set[0:20, :].X, y=ts_set[0:20, :].Y)
print("Attack completed.")

show_digits_1(eva_adv_ds.X, eva_y_pred, 5)
show_digits_1(eva_adv_ds.X[5:10, :], eva_y_pred[5:10], 5)
show_digits_1(eva_adv_ds.X[10:15, :], eva_y_pred[10:15], 5)
show_digits_1(eva_adv_ds.X[15:20, :], eva_y_pred[15:20], 5)

n_wrong = 0
print("# \tCorrect Predicted")
for i in range(20):
  correct = ts_set[i, :].Y.get_data()[0]
  pred = eva_y_pred[i].get_data()[0]
  if correct != pred:
    n_wrong += 1
  print(i, "\t", correct, "\t", pred)
print("Number of successful evasions: ", n_wrong)
The 20 evasion images are shown. Each image has some dots added to it, and the predicted classifications, shown above each image, are wrong.

The lower portion of the output shows a chart of all 20 images, showing that all 20 attempts to evade classification succeeded.

5. Training DNR

Execute these commands to prepare the three SVMs and the classifier, train it on the 1000 images in the training set ts_set, and choose a rejection threshold that discards 10% of the samples.
from secml.ml.classifiers import CClassifierSVM
from secml.ml.kernels import CKernelRBF
from secml.ml.classifiers.reject import CClassifierDNR

layers = ['features:relu2', 'features:relu3', 'features:relu4']
combiner = CClassifierSVM(kernel=CKernelRBF(gamma=1), C=0.1)
layer_clf = CClassifierSVM(kernel=CKernelRBF(gamma=1e-2), C=10)

dnr = CClassifierDNR(combiner=combiner, layer_clf=layer_clf, dnn=dnn,
                     layers=layers, threshold=-1000)
dnr.set_params({'features:relu4.C': 1, 'features:relu2.kernel.gamma': 1e-3})

print("Training started...")
dnr.fit(x=tr_set.X, y=tr_set.Y)
print("Training completed.")

# set the reject threshold in order to have 10% of rejected samples on the test set
print("Computing reject threshold...")
dnr.threshold = dnr.compute_threshold(rej_percent=0.1, ds=ts_set)
print("Threshold:", dnr.threshold)
As shown below, the threshold is approximately -1.

6. Viewing Predictions and Rejection

Execute this code to show the model protected by DNR classifies the first 20 images:
print("Image, Prediction, Correct, Reject?: \tScores")
print("", end = "\t")
for i in range(10):
  print(" ", i, end="\t")
print("Threshold")
for iter in range(20):
  prediction, scores = dnr.predict(ts_set[iter, :].X, True)
  for p in prediction:
    pi = int(p)
    if pi < 0:
      pi = "R"
  for y in ts_set[iter, :].Y:
    yi = int(y)
  print(iter, pi, yi, end = ":\t")
  for s in scores:
    print(int(100 * s), end = "\t")
  print()
As shown below, images #8, 15, and 18 are rejected, denoted by an "R" in the second column.

7. Preparing Attack Images for the DNR Model

Execute these commands to create evasion images to attack the DNR model:
from secml.adv.attacks import CAttackEvasionPGDExp

solver_params = {'eta': 1e-1, 'eta_min': 1e-1, 'max_iter': 30, 'eps': 1e-8}

pgd_exp = CAttackEvasionPGDExp(classifier=dnr, double_init_ds=tr_set, dmax=2,
                               distance='l2', solver_params=solver_params)

print("Running attack...")
eva_y_pred, _, eva_adv_ds, _ = pgd_exp.run(x=ts_set[0:20, :].X, y=ts_set[0:20, :].Y)
print("Attack completed.")

show_digits_1(eva_adv_ds.X, eva_y_pred, 5)
show_digits_1(eva_adv_ds.X[5:10, :], eva_y_pred[5:10], 5)
show_digits_1(eva_adv_ds.X[10:15, :], eva_y_pred[10:15], 5)
show_digits_1(eva_adv_ds.X[15:20, :], eva_y_pred[15:20], 5)

n_wrong = 0
print("# \tCorrect \tPredicted")
for i in range(20):
  correct = ts_set[i, :].Y.get_data()[0]
  pred = eva_y_pred[i].get_data()[0]
  if (correct != pred) and (pred >= 0):
    n_wrong += 1
  print(i, "\t", correct, "\t", pred)
print("Number of successful evasions: ", n_wrong)
Note: this function is defined here:

secml.adv.attacks.evasion

The output shows the attack images, with the predicted classification above each image. Notice that many of the attack images are rejected by DNR, so the prediction is "-1", as shown below.

At the bottom, a chart shows the details of each image. Only 6 of the images were successful at evading correct classification now, as shown below. (You may see 7 instead of 6.)

Flag ML 140.1: Features (15 pts)

Execute this command, which calculates the "norm" of the last image (calculated from the sum of the pixels):
dnr.n_features
The flag is covered by a rectangle in the image below.

Flag ML 140.2: More Rejections (10 pts)

In step 5, adjust the DNR to reject 50% of the images.

Now there are only 2 successful evasions of the DNR model.

Execute this command to see the flag:

dnr.threshold
The flag is covered by a rectangle in the image below.

Flag ML 140.3: Fewer Rejections (10 pts)

In step 5, adjust the DNR to reject 2% of the images.

Execute these command to see the flag:

d = str( dnr.get_state() )
a = d.find("7, 8, 9")
print( d[a-50:a+30])
The flag is covered by a rectangle in the image below.

Flag ML 140.4: Fast Attack (10 pts)

In step 5, adjust the DNR to reject 10% of the images.

In step 7, adjust the max_iter to 2.

Now there is only 1 successful evasion of the DNR model.

Execute these command to see the flag:

d = dnr.get_state()
ks = list(d.keys())
for k in ks:
   if k[0] == 'c':
     v = str(d[k])
     print(k, v[:50])
The flag is covered by a rectangle in the image below.

Sources

Evasion and Poisoning Attacks on MNIST dataset
secml.optim.optimizers

Posted 5-8-23
Code for flag 4 fixed 12-13-23