[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

Support Vector Machine 目录
1. Download the CIFAR10 datasets, and load it
2. Preprocessing
3. Define a linear SVM classifier
4. Gradient Check
5. Validation and Test
[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习
1. Download the CIFAR10 datasets, and load it Setup code
Load the CIFAR10 dataset
Show some CIFAR10 images
Subsample the data for more efficient code execution
knn不具有显示的学习过程,svm分类器则不同,它通过训练学习参数W和 b,将其保存。训练完成,训练数据就可以丢弃,留下学习到的参数即可。之后一个测试图像可以简单地输入函数,并基于计算出的分类分值来进行分类。而参数的学习过程就是训练过程。

""" 我们这里除了训练集、验证集、测试集之外又从训练集中随机选择500个 样本作为development set,在最终的训练和预测之前,我们都使用这个小的数据集, 当然,直接使用完整的训练集也是可以的,不过就是花费的时间有点多。 注意:这里需要先写验证集,再写训练集,否则会报错超出范围!! """ # Split the data into train, val, and test sets num_train = 49000 num_val = 1000 num_test = 1000# Validation set mask = range(num_train, num_train + num_val) X_val = X_train[mask] y_val = y_train[mask]# Train set mask = range(num_train) X_train = X_train[mask] y_train = y_train[mask]# Test set mask = range(num_test) X_test = X_test[mask] y_test = y_test[mask]print('Train data shape: ', X_train.shape) print('Train labels shape: ', y_train.shape) print('Validation data shape: ', X_val.shape) print('Validation labels shape ', y_val.shape) print('Test data shape: ', X_test.shape) print('Test labels shape: ', y_test.shape)

2. Preprocessing Reshape the images data into rows
# Preprocessing: reshape the images data into rows X_train = np.reshape(X_train, (X_train.shape[0], -1)) X_val = np.reshape(X_val, (X_val.shape[0], -1)) X_test = np.reshape(X_test, (X_test.shape[0], -1))print('Train data shape: ', X_train.shape) print('Validation data shape: ', X_val.shape) print('Test data shape: ', X_test.shape)

Subtract the mean images
# Processing: subtract the mean images mean_image = np.mean(X_train, axis=0) plt.figure(figsize=(4,4)) plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) plt.show()X_train -= mean_image X_val -= mean_image X_test -= mean_image

Append the bias dimension of ones
[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

# append the bias dimension of ones (i.e. bias trick) X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]) X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]) X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]) print('Train data shape: ', X_train.shape) print('Validation data shape: ', X_val.shape) print('Test data shape: ', X_test.shape)# 数据处理完成

3. Define a linear SVM classifier 计算loss时利用两层循环进行,对于每一个训练集,利用其乘以W之后,得到其对每个类的得分score以及正确标签的得分correct_class_score, 然后内层循环对每个类,分别计算max(0, score-correct_class_score+1), loss值为输入的所有X的loss之和的均值,然后加上一个L2正则项以防止W过于复杂,即total_loss = avg_loss + lambda *sum(W*W), 后面是我们要实现求dW, 也即求梯度,

损失函数的计算的公式如下(这里是加入正则项之后的损失函数值,正则化惩罚可以 带来很多良好的性质):
[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

(1)数值计算:[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

(2)微分分析计算:[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

class LinearSVM(object): """ A subclass that uses the Multiclass SVM loss function """ def __init__(self): self.W = None """ 以下两个函数的参数相同 输入: - X:shape(num_train, D)的numpy数组,包含训练数据 由每个维D的num_train样本组成 - y:一个形状的numpy数组(num_train,)包含训练标签, 其中y[i]是X[i]的标签 -reg:float,正则化系数 返回: -loss:预测值与真实值之间的损耗值 - dW: W的梯度 """ """--------------(1)采用数值方式计算损失函数和梯度---------------------------"""def loss_naive(self, X, y, reg):# Initialize loss and dW loss = 0.0 dW = np.zeros(self.W.shape)# Compute the loss and dW num_train = X.shape[0] num_classes = self.W.shape[1] for i in range(num_train): scores = np.dot(X[i], self.W) for j in range(num_classes): if j == y[i]: margin = 0 # 当j=y[i]时,代表正确分类,由L=max(...)公式可知正确分类没有loss else: margin = scores[j] - scores[y[i]] + 1# 实现L = max(...) delta = 1 # scores[y[i]]是计算正确分类的分数 if margin > 0: loss += margin dW[:,j] += X[i].T dW[:,y[i]] += -X[i].T # Divided by num_train loss /= num_train dW /= num_train# Add regularization loss += 0.5 * reg * np.sum(self.W * self.W) dW += reg * self.Wreturn loss, dW"""--------------(2)采用矩阵的方式计算损失函数和梯度---------------------------""" def loss_vectorized(self, X, y, reg): """ Structured SVM loss function, naive implementation (with loops). Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i] - reg: (float) regularization strength Outputs: - loss: the loss value between predict value and ground truth - dW: gradient of W 输入: - X: shape(num_train, D)的一个numpy阵列,包含训练数据 由每个维D num_train的样本组成 - y: shape(num_train,)的一个numpy阵列,包含训练标签, y[i]是X[i]的标签 - reg:(float)正则化的strength 输出: - loss:损失价值之间的预测价值和地面真理 dW:W的梯度 """# Initialize loss and dW loss = 0.0 dW = np.zeros(self.W.shape)# Compute the loss num_train = X.shape[0] scores = np.dot(X, self.W) correct_score = scores[range(num_train), list(y)].reshape(-1, 1)# delta = -1 margin = np.maximum(0, scores - correct_score + 1) margin[range(num_train), list(y)] = 0 loss = np.sum(margin) / num_train + 0.5 * reg * np.sum(self.W * self.W)# Compute the dW num_classes = self.W.shape[1] mask = np.zeros((num_train, num_classes)) mask[margin > 0] = 1 mask[range(num_train), list(y)] = 0 mask[range(num_train), list(y)] = -np.sum(mask, axis=1) dW = np.dot(X.T, mask) dW = dW / num_train + reg * self.Wreturn loss, dWdef train(self, X, y, learning_rate = 1e-3, reg = 1e-5, num_iters = 100, batch_size = 200, print_flag = False): """ Train linear SVM classifier 采用随机梯度下降法 Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i], y[i] = c, 0 <= c <= C - learning rate: (float) learning rate for optimization - reg: (float) regularization strength - num_iters: 迭代次数(integer) numbers of steps to take when optimization - batch_size: 批尺寸(integer) number of training examples to use at each step - print_flag: 为True是显示中间迭代过程(boolean) If true, print the progress during optimization Outputs: - loss_history: 每次迭代的损失函数值 A list containing the loss at each training iteration """loss_history = [] num_train = X.shape[0] dim = X.shape[1] num_classes = np.max(y) + 1# Initialize W if self.W == None: self.W = 0.001 * np.random.randn(dim, num_classes)# 迭代和优化 """ 后面是SGD,首先实现train函数,sample的方式也就是一般机器学习里的技巧,利用np.random.choice()生成index, 然后取X,y中的对应项,而更新W的方式更加简单,梯度下降,W = W - lr * dW, 代码如下: """ for t in range(num_iters): idx_batch = np.random.choice(num_train, batch_size, replace=True) X_batch = X[idx_batch] y_batch = y[idx_batch] loss, dW = self.loss_vectorized(X_batch, y_batch, reg) loss_history.append(loss) self.W += -learning_rate * dWif print_flag and t%100 == 0: print('iteration %d / %d: loss %f' % (t, num_iters, loss))return loss_historydef predict(self, X): """ Use the trained weights of linear SVM to predict data labels Inputs: - X: A numpy array of shape (num_train, D) contain the training data Outputs: - y_pred: A numpy array, predicted labels for the data in X """y_pred = np.zeros(X.shape[0]) scores = np.dot(X, self.W) y_pred = np.argmax(scores, axis=1)return y_pred

4. Gradient Check Define loss function
def loss_naive1(X, y, W, reg): """ Structured SVM loss function, naive implementation (with loops). Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i] - W: A numpy array of shape (D, C) contain the weights - reg: float, regularization strength Return: - loss: the loss value between predict value and ground truth - dW: gradient of W """# Initialize loss and dW loss = 0.0 dW = np.zeros(W.shape)# Compute the loss and dW num_train = X.shape[0] num_classes = W.shape[1] for i in range(num_train): scores = np.dot(X[i], W) for j in range(num_classes): if j == y[i]: margin = 0 else: margin = scores[j] - scores[y[i]] + 1# delta = 1 if margin > 0: loss += margin dW[:,j] += X[i].T dW[:,y[i]] += -X[i].T # Divided by num_train loss /= num_train dW /= num_train# Add regularization loss += 0.5 * reg * np.sum(W * W) dW += reg * Wreturn loss, dWdef loss_vectorized1(X, y, W, reg): """ Structured SVM loss function, naive implementation (with loops). Inputs: - X: A numpy array of shape (num_train, D) contain the training data consisting of num_train samples each of dimension D - y: A numpy array of shape (num_train,) contain the training labels, where y[i] is the label of X[i] - W: A numpy array of shape (D, C) contain the weights - reg: (float) regularization strength Outputs: - loss: the loss value between predict value and ground truth - dW: gradient of W """# Initialize loss and dW loss = 0.0 dW = np.zeros(W.shape)# Compute the loss num_train = X.shape[0] scores = np.dot(X, W) correct_score = scores[range(num_train), list(y)].reshape(-1, 1)# delta = -1 margin = np.maximum(0, scores - correct_score + 1) margin[range(num_train), list(y)] = 0 loss = np.sum(margin) / num_train + 0.5 * reg * np.sum(W * W)# Compute the dW num_classes = W.shape[1] mask = np.zeros((num_train, num_classes)) mask[margin > 0] = 1 mask[range(num_train), list(y)] = 0 mask[range(num_train), list(y)] = -np.sum(mask, axis=1) dW = np.dot(X.T, mask) dW = dW / num_train + reg * Wreturn loss, dW

Gradient check
[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

cs231n提供的梯度检验程序:def grad_check_sparse
from gradient_check import grad_check_sparse import time# 生成一个随机的小数字SVM权矩阵 W = np.random.randn(3073, 10) * 0.0001# Without regularization loss, dW = loss_naive1(X_val, y_val, W, 0) f = lambda W: loss_naive1(X_val, y_val, W, 0.0)[0] grad_numerical = grad_check_sparse(f, W, dW)# With regularization loss, dW = loss_naive1(X_val, y_val, W, 5e1) f = lambda W: loss_naive1(X_val, y_val, W, 5e1)[0] grad_numerical = grad_check_sparse(f, W, dW)

loss_naive vs. loss_vectorized
t_st = time.time() loss_naive, dW_naive = loss_naive1(X_val, y_val, W, 0.00005) t_ed = time.time() print('Naive loss: %e computed in %f seconds.' % (loss_naive, t_ed - t_st)) t_st = time.time() loss_vectorized, dW_vectorized = loss_vectorized1(X_val, y_val, W, 0.00005) t_ed = time.time() print('Vectorized loss: %e computed in %f seconds.' % (loss_vectorized, t_ed - t_st))diff_loss = loss_naive - loss_vectorized diff_dW = np.linalg.norm(dW_naive - dW_vectorized, ord='fro') print('Difference of loss: %f' % diff_loss) print('Difference of dW: %f' % diff_dW)

[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

由上图可知,两种计算方式得到的损失函数值是相同的,而采用向量方法计算时间花费少很多,因此接下来将使用 svm_loss_vectorized (矩阵)方法计算损失函 数和梯度。
既然两种方法计算得出的损失函数值是一样的,那么梯度应该也是一样的,也就不需要再对第二种方法进行梯度检验了,不过损失函 数是一维的,而梯度是二维的,可以使用 np.linalg.norm 函数来计算范数,其余同 上。
Stochastic Gradient Descent
现在已知采用向量方法计算损失函数和梯度效率最高,并且得到的梯度经验证误差很小,接下来我们将使用随机梯度下降法(SGD)来进行梯度更新, 使得损失函数值最小。
svm = LinearSVM() loss_history = svm.train(X_train, y_train, learning_rate = 1e-7, reg = 2.5e4, num_iters = 2000, batch_size = 200, print_flag = True)

# Plot the loss_history plt.plot(loss_history) plt.xlabel('Iteration number') plt.ylabel('Loss value') plt.show()

[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习

# Use svm to predict # Training set y_pred = svm.predict(X_train) num_correct = np.sum(y_pred == y_train) accuracy = np.mean(y_pred == y_train) print('Training correct %d/%d: The accuracy is %f' % (num_correct, X_train.shape[0], accuracy))# Test set y_pred = svm.predict(X_test) num_correct = np.sum(y_pred == y_test) accuracy = np.mean(y_pred == y_test) print('Test correct %d/%d: The accuracy is %f' % (num_correct, X_test.shape[0], accuracy))

5. Validation and Test Cross-validation
从列举的学习率和正则项中选择验证集正确率最高的超参数,将参数 保存到 best_svm 中,其中 results 存储的是形如 {(lr,reg): (train_accuracy,val_accuracy)} 的字典。
learning_rates = [1.4e-7, 1.5e-7, 1.6e-7] regularization_strengths = [8000.0, 9000.0, 10000.0, 11000.0, 18000.0, 19000.0, 20000.0, 21000.0]results = {} best_lr = None best_reg = None best_val = -1# The highest validation accuracy that we have seen so far. best_svm = None # The LinearSVM object that achieved the highest validation rate.for lr in learning_rates: for reg in regularization_strengths: svm = LinearSVM() loss_history = svm.train(X_train, y_train, learning_rate = lr, reg = reg, num_iters = 2000) y_train_pred = svm.predict(X_train) accuracy_train = np.mean(y_train_pred == y_train) y_val_pred = svm.predict(X_val) accuracy_val = np.mean(y_val_pred == y_val) if accuracy_val > best_val: best_lr = lr best_reg = reg best_val = accuracy_val best_svm = svm results[(lr, reg)] = accuracy_train, accuracy_val print('lr: %e reg: %e train accuracy: %f val accuracy: %f' % (lr, reg, results[(lr, reg)][0], results[(lr, reg)][1])) print('Best validation accuracy during cross-validation:\nlr = %e, reg = %e, best_val = %f' % (best_lr, best_reg, best_val))

# Visualize the cross-validation results import mathx_scatter = [math.log10(x[0]) for x in results] y_scatter = [math.log10(x[1]) for x in results]# Plot training accuracy plt.figure(figsize=(10,10)) make_size = 100 colors = [results[x][0] for x in results]# 使用面积来表示正确率的大小 plt.subplot(2, 1, 1) plt.scatter(x_scatter, y_scatter, make_size, c = colors) plt.colorbar() plt.xlabel('log learning rate') plt.ylabel('log regularization strength') plt.title('Training accuracy')# Plot validation accuracy colors = [results[x][1] for x in results] plt.subplot(2, 1, 2) plt.scatter(x_scatter, y_scatter, make_size, c = colors) plt.colorbar() plt.xlabel('log learning rate') plt.ylabel('log regularization strength') plt.title('Validation accuracy') plt.show()

# Use the best svm to test y_test_pred = best_svm.predict(X_test) num_correct = np.sum(y_test_pred == y_test) accuracy = np.mean(y_test_pred == y_test) print('Test correct %d/%d: The accuracy is %f' % (num_correct, X_test.shape[0], accuracy))

Visualize the weights for each class
W = best_svm.W[:-1, :]# delete the bias 将偏置分离出来,也就是说,我们只可视化权重。 W = W.reshape(32, 32, 3, 10) W_min, W_max = np.min(W), np.max(W) classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] for i in range(10): plt.subplot(2, 5, i+1) imgW = 255.0 * ((W[:, :, :, i].squeeze() - W_min) / (W_max - W_min)) plt.imshow(imgW.astype('uint8')) plt.axis('off') plt.title(classes[i]) plt.show()

总结 1. 完成一个使用向量方法计算svm损失函数;
2. 完成一个使用向量方法来分析梯度;
3. 使用数学方法来检查梯度
4. 用验证集来微调学习率和正则项;
5. 使用随机梯度下降法来优化损失函数;
【[cs231n]Assignment1_SVM|[cs231n]Assignment1_SVM 代码学习】6. 可视化最后学习到的权重
