ML 140: Deep Neural Rejection (45 pts extra)

What You Need

A Web browser

Purpose

To practice training and evaluating DNR (Deep Neural Rejection), a reject-based defense against evasion attacks. DNR rejects images that look significantly different from most of the training images, which most evading images do.

This project is based on this tutorial:

Deep Neural Rejection

Using Google Colab

In a browser, go to

https://colab.research.google.com/

If you see a blue "Sign In" button at the top right, click it and log into a Google account.

From the menu, click File, "New notebook".

Understanding DNR

DNR analyzes the representations of input samples at different network layers, and rejects samples which exhibit anomalous behavior with respect to that observed from the training data at such layers.

The figure below represents this process. Note these features:

The left side shows the neural net processing an image from the bottom, through several layers of neurons, to output categories at the top.
The boxes labelled g₁, g₂, and g₃ are SVM (Support Vector Machines) which assign a "class score" to each stage of processing, which is a measure of how unusual a sample is compared to the training data.
The box labeled "classifier" is another SVM that combined the class scores to a single output score.
Samples with an output score above a rejection threshold are discarded.

1. Downloading the Data

Execute these commands to import the SecML library and download the MNIST dataset, which contains 70,000 small images of handwritten digits.

They also download a pre-trained model, which we call "dnn" for Deep Neural Network.

!pip install git+https://github.com/pralab/secml import secml import torch from secml.array import CArray from secml.data.loader import CDataLoaderMNIST from secml.data.selection import CPSRandom from secml.data.splitter import CDataSplitterShuffle # load MNIST training set and divide it in two parts tr_data = CDataLoaderMNIST().load(ds='training') tr_data.X /= 255.0 splitter = CDataSplitterShuffle(num_folds=1, train_size=0.5, test_size=0.5, random_state=1) splitter.compute_indices(tr_data) # dnr training set, reduced to 5000 random samples tr_set = tr_data[splitter.tr_idx[0], :] tr_set = CPSRandom().select(dataset=tr_set, n_prototypes=5000, random_state=0) # load test set ts_set = CDataLoaderMNIST().load(ds='testing', num_samples=1000) ts_set.X /= 255.0 from secml.model_zoo import load_model # load from model zoo the pre-trained net dnn = load_model("mnist-cnn") print( "Training Set Shaoes:", ts_set.X.shape, ts_set.Y.shape )

As shown below, a few pages of messages scroll by, ending with several "File stored" messages.

The last line shows the "shape" of the test set:

ts_set.X contains 1000 images, each with 784 pixels
ts_set.Y contains 1000 labels

2. Viewing the Images

Execute these commands to view the first 20 images:

from secml.figure import CFigure # Only required for visualization in notebooks %matplotlib inline # Let's define a convenience function to easily plot the MNIST dataset def show_digits_1(samples, labels, n_display): samples = samples.atleast_2d() n_display = min(n_display, samples.shape[0]) fig = CFigure(width=n_display*2, height=3) for idx in range(n_display): fig.subplot(2, n_display, idx+1) fig.sp.xticks([]) fig.sp.yticks([]) fig.sp.imshow(samples[idx, :].reshape((28, 28)), cmap='gray') fig.sp.title("{}".format(labels[idx].item())) fig.show() show_digits_1(ts_set.X, ts_set.Y, 5) show_digits_1(ts_set[5:10, :].X, ts_set[5:10, :].Y, 5) show_digits_1(ts_set[10:15, :].X, ts_set[10:15, :].Y, 5) show_digits_1(ts_set[15:20, :].X, ts_set[15:20, :].Y, 5)

As shown below, the images are handwritten digits. The labels above the images show the correct classification.

3. Viewing Predictions of the Unprotected "DNN" Model

Let's see how well the "DNN" model classifies these images.

Execute this code:

print("Image, Prediction, Correct: \tScores") print("", end = "\t") for i in range(10): print(" ", i, end="\t") print() for iter in range(20): prediction, scores = dnn.predict(ts_set[iter, :].X, True) p = prediction.get_data()[0] y = ts_set[iter, :].Y.get_data()[0] print(iter, p, y, end = ":\t") for s in scores: print(int(100 * s), end = "\t") print()

The first three numbers in each row show the image number, the predicted classification, and the correct classification. The rest of the numbers show the strength of the output signal for each of the ten possible classification categories.

If the second and third numbers match, the model correctly classifies the image. As shown below, most of the images are correctly classified.

However, image #8 is really a 5, but is incorrectly classified as a 6 by the DNN model.

Also, image #18 is really a 3, but is incorrectly classified as a 5 by the DNN model.

4. Attacking the Unprotected "DNN" Model

Execute these commands to create evasion images from the first 20 images:

from secml.adv.attacks import CAttackEvasionPGDExp solver_params = {'eta': 1e-1, 'eta_min': 1e-1, 'max_iter': 40, 'eps': 1e-8} pgd_exp = CAttackEvasionPGDExp(classifier=dnn, double_init_ds=tr_set, dmax=2, distance='l2', solver_params=solver_params) print("Running attack...") eva_y_pred, _, eva_adv_ds, _ = pgd_exp.run(x=ts_set[0:20, :].X, y=ts_set[0:20, :].Y) print("Attack completed.") show_digits_1(eva_adv_ds.X, eva_y_pred, 5) show_digits_1(eva_adv_ds.X[5:10, :], eva_y_pred[5:10], 5) show_digits_1(eva_adv_ds.X[10:15, :], eva_y_pred[10:15], 5) show_digits_1(eva_adv_ds.X[15:20, :], eva_y_pred[15:20], 5) n_wrong = 0 print("# \tCorrect Predicted") for i in range(20): correct = ts_set[i, :].Y.get_data()[0] pred = eva_y_pred[i].get_data()[0] if correct != pred: n_wrong += 1 print(i, "\t", correct, "\t", pred) print("Number of successful evasions: ", n_wrong)

The 20 evasion images are shown. Each image has some dots added to it, and the predicted classifications, shown above each image, are wrong.

The lower portion of the output shows a chart of all 20 images, showing that all 20 attempts to evade classification succeeded.

5. Training DNR

Execute these commands to prepare the three SVMs and the classifier, train it on the 1000 images in the training set ts_set, and choose a rejection threshold that discards 10% of the samples.

from secml.ml.classifiers import CClassifierSVM from secml.ml.kernels import CKernelRBF from secml.ml.classifiers.reject import CClassifierDNR layers = ['features:relu2', 'features:relu3', 'features:relu4'] combiner = CClassifierSVM(kernel=CKernelRBF(gamma=1), C=0.1) layer_clf = CClassifierSVM(kernel=CKernelRBF(gamma=1e-2), C=10) dnr = CClassifierDNR(combiner=combiner, layer_clf=layer_clf, dnn=dnn, layers=layers, threshold=-1000) dnr.set_params({'features:relu4.C': 1, 'features:relu2.kernel.gamma': 1e-3}) print("Training started...") dnr.fit(x=tr_set.X, y=tr_set.Y) print("Training completed.") # set the reject threshold in order to have 10% of rejected samples on the test set print("Computing reject threshold...") dnr.threshold = dnr.compute_threshold(rej_percent=0.1, ds=ts_set) print("Threshold:", dnr.threshold)

As shown below, the threshold is approximately -1.

6. Viewing Predictions and Rejection

Execute this code to show the model protected by DNR classifies the first 20 images:

print("Image, Prediction, Correct, Reject?: \tScores") print("", end = "\t") for i in range(10): print(" ", i, end="\t") print("Threshold") for iter in range(20): prediction, scores = dnr.predict(ts_set[iter, :].X, True) for p in prediction: pi = int(p) if pi < 0: pi = "R" for y in ts_set[iter, :].Y: yi = int(y) print(iter, pi, yi, end = ":\t") for s in scores: print(int(100 * s), end = "\t") print()

As shown below, images #8, 15, and 18 are rejected, denoted by an "R" in the second column.

7. Preparing Attack Images for the DNR Model

Execute these commands to create evasion images to attack the DNR model:

from secml.adv.attacks import CAttackEvasionPGDExp solver_params = {'eta': 1e-1, 'eta_min': 1e-1, 'max_iter': 30, 'eps': 1e-8} pgd_exp = CAttackEvasionPGDExp(classifier=dnr, double_init_ds=tr_set, dmax=2, distance='l2', solver_params=solver_params) print("Running attack...") eva_y_pred, _, eva_adv_ds, _ = pgd_exp.run(x=ts_set[0:20, :].X, y=ts_set[0:20, :].Y) print("Attack completed.") show_digits_1(eva_adv_ds.X, eva_y_pred, 5) show_digits_1(eva_adv_ds.X[5:10, :], eva_y_pred[5:10], 5) show_digits_1(eva_adv_ds.X[10:15, :], eva_y_pred[10:15], 5) show_digits_1(eva_adv_ds.X[15:20, :], eva_y_pred[15:20], 5) n_wrong = 0 print("# \tCorrect \tPredicted") for i in range(20): correct = ts_set[i, :].Y.get_data()[0] pred = eva_y_pred[i].get_data()[0] if (correct != pred) and (pred >= 0): n_wrong += 1 print(i, "\t", correct, "\t", pred) print("Number of successful evasions: ", n_wrong)

Note: this function is defined here:

secml.adv.attacks.evasion

The output shows the attack images, with the predicted classification above each image. Notice that many of the attack images are rejected by DNR, so the prediction is "-1", as shown below.

At the bottom, a chart shows the details of each image. Only 6 of the images were successful at evading correct classification now, as shown below. (You may see 7 instead of 6.)

Flag ML 140.1: Features (15 pts)
Execute this command, which calculates the "norm" of the last image (calculated from the sum of the pixels):
dnr.n_features
The flag is covered by a rectangle in the image below.

Flag ML 140.2: More Rejections (10 pts)
In step 5, adjust the DNR to reject 50% of the images.
Now there are only 2 successful evasions of the DNR model.
Execute this command to see the flag:
dnr.threshold
The flag is covered by a rectangle in the image below.

Flag ML 140.3: Fewer Rejections (10 pts)
In step 5, adjust the DNR to reject 2% of the images.
Execute these command to see the flag:
d = str( dnr.get_state() ) a = d.find("7, 8, 9") print( d[a-50:a+30])
The flag is covered by a rectangle in the image below.

Flag ML 140.4: Fast Attack (10 pts)
In step 5, adjust the DNR to reject 10% of the images.
In step 7, adjust the max_iter to 2.
Now there is only 1 successful evasion of the DNR model.
Execute these command to see the flag:
d = dnr.get_state() ks = list(d.keys()) for k in ks: if k[0] == 'c': v = str(d[k]) print(k, v[:50])
The flag is covered by a rectangle in the image below.

Sources

Evasion and Poisoning Attacks on MNIST dataset
secml.optim.optimizers

Posted 5-8-23
Code for flag 4 fixed 12-13-23