機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

反向傳播神經網絡多分類:手寫數字識別

  • 描述
  • 包含的文件
  • 數據可視化
  • 模型表示
  • 向前傳播和代價函數
  • 向後傳播
  • 梯度檢驗
  • 隱藏層可視化
  • 預測

描述

  • 我們將對神經網絡使用反向傳播算法實現手寫數字的識別。
  • 需要安裝Octave/Matlab環境
  • 數據下載:https://github.com/peedeep/Coursera/tree/master/ex4
機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

包含的文件

  • ex4.m : 進行實驗的Octave/MATLAB腳本
  • ex4data1.mat:手寫數字訓練集
  • ex4weights.mat: 神經網絡的參數
  • displayData.m :對數據集可視化
  • fmincg.m: 最優化函數,和fminunc相似
  • sigmoid.m:激活函數
  • computeNumericalGradient.m: 梯度下降算法
  • checkNNGradients.m:梯度檢驗
  • debugInitializeWeights.m:初始化權重
  • predict.m: 神經網絡預測
  • sigmoidGradient.m:激活函數求導
  • randInitializeWeights.m:隨機初始化權重
  • nnCostFunction.m:神經網絡代價函數

數據可視化

調用函數displayData加載數據並展示在二維圖上displayData.m

%% =========== 1.Loading and Visualizing Data =============
load('ex4data1.mat');
m = size(X, 1);
sel = randperm(m);
sel = sel(1:100);
displayData(X(sel, :));
fprintf('\\nProgram paused. Press enter to continue.\\n');
pause;
機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

數據和前一章是相同的。ex3data1.mat5000個訓練樣本,每個訓練樣本都是20*20像素灰度數字,每個像素都是一個浮點數,代表該點的灰度值。20*20像素被展開成400維的向量。

模型表示

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

我們的神經網絡有三層:輸入層、隱藏層、輸出層。

這裡提供了已經訓練好的網絡參數θ(1),θ(2),保存在ex3weights.mat

%% =========== 2.Loading Parameters =============
fprintf('\\nLoading Saved Neural Network Parameters ...\\n')
% Load the weights into variables Theta1 and Theta2
load('ex4weights.mat');
% Unroll parameters
nn_params = [Theta1(:) ; Theta2(:)];

向前傳播和代價函數

神經網絡代價函數:

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

原始標籤y是0,1,2, ...,9,為了訓練神經網絡,這裡我們需要變成只包含0或1的向量:

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

nnCostFunction.m代價函數實現:

%% Neural network cost function
function [J] = nnCostFunction(X, y, nn_params, lambda, input_layer_size, hidden_layer_size, num_labels)
Theta1 = reshape(nn_params(1: hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params(1 + (hidden_layer_size * (input_layer_size + 1)):end), num_labels, (hidden_layer_size + 1));
m = size(X, 1);
J = 0;
%% Y(find(y==3))= [0 0 1 0 0 0 0 0 0 0];
Y=[];
E = eye(num_labels);
for i = 1 : num_labels
Y0 = find(y == i); % vector
Y(Y0,:) = repmat(E(i,:), size(Y0,1), 1);
end
%% regularized Feedforward cost function lambda=1
X = [ones(m, 1) X];
a2 = sigmoid(X * Theta1');
a2 = [ones(m, 1) a2];
a3 = sigmoid(a2 * Theta2');
Theta1_temp = [zeros(size(Theta1,1),1) Theta1(:,2:end)];% 把theta(1)拿掉,不參與正則化
Theta2_temp = [zeros(size(Theta2,1),1) Theta2(:,2:end)];
temp1 = sum(Theta1_temp .^ 2);
temp2 = sum(Theta2_temp .^ 2);
cost = Y .* log(a3) + (1 - Y) .* log((1 - a3));% h(x) = a3
J = -1 / m * sum(cost(:)) + lambda/(2*m) * (sum(temp1(:)) + sum(temp2(:)));

驗證代價函數(無正則化),使用已經給到的權重,代價期望是0.287629

%% =========== 3.Compute Cost (Feedforward) =============
input_layer_size = 400;
hidden_layer_size = 25;
num_labels = 10;
lambda = 0;
J = nnCostFunction(X, y, nn_params, lambda, input_layer_size, hidden_layer_size, num_labels);
fprintf(['Cost at parameters (loaded from ex4weights): %f \\n(this value should be about 0.287629)\\n'], J);
fprintf('\\nProgram paused. Press enter to continue.\\n');

pause;

驗證代價函數(正則化),使用已經給到的權重,代價期望是0.383770

%% =========== 4.Implement Regularization =============
fprintf('\\nChecking Cost Function (w/ Regularization) ... \\n')
lambda = 1;
J = nnCostFunction(X, y, nn_params, lambda, input_layer_size, hidden_layer_size, num_labels);
fprintf('Cost at parameters (loaded from ex4weights): %f \\n(this value should be about 0.383770)\\n', J);
fprintf('\\nProgram paused. Press enter to continue.\\n');
pause;

向後傳播

接下來,我們將為神經網絡代價函數計算梯度,實現向後傳播算法,我們在nnCostFunction.m上繼續補充,以返回梯度:

%% Neural network cost function
function [J, gradient] = nnCostFunction(X, y, nn_params, lambda, input_layer_size, hidden_layer_size, num_labels)
Theta1 = reshape(nn_params(1: hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params(1 + (hidden_layer_size * (input_layer_size + 1)):end), num_labels, (hidden_layer_size + 1));
m = size(X, 1);
J = 0;
%% Y(find(y==3))= [0 0 1 0 0 0 0 0 0 0];
Y=[];
E = eye(num_labels);
for i = 1 : num_labels
Y0 = find(y == i); % vector
Y(Y0,:) = repmat(E(i,:), size(Y0,1), 1);
end
%% regularized Feedforward cost function lambda=1
X = [ones(m, 1) X];
a2 = sigmoid(X * Theta1');
a2 = [ones(m, 1) a2];

a3 = sigmoid(a2 * Theta2');
Theta1_temp = [zeros(size(Theta1,1),1) Theta1(:,2:end)];% 把theta(1)拿掉,不參與正則化
Theta2_temp = [zeros(size(Theta2,1),1) Theta2(:,2:end)];
temp1 = sum(Theta1_temp .^ 2);
temp2 = sum(Theta2_temp .^ 2);
cost = Y .* log(a3) + (1 - Y) .* log((1 - a3));% h(x) = a3
J = -1 / m * sum(cost(:)) + lambda/(2*m) * (sum(temp1(:)) + sum(temp2(:)));
%% Graident
Delta_1 = zeros(size(Theta1)); % 25 * 401
Delta_2 = zeros(size(Theta2)); % 10 * 26
for i = 1:m
\t% step 1
\ta_1 = X(i, :)'; % 401 * 1
\tz_2 = Theta1 * a_1; % 25 * 1
\ta_2 = sigmoid(z_2); % 25 * 1
\ta_2 = [1; a_2]; % 26 * 1
\tz_3 = Theta2 * a_2; % 10 * 1
\ta_3 = sigmoid(z_3); % 10 * 1
\t% step 2
\tdelta_3 = zeros(num_labels, 1); % 10 * 1
\tfor k = 1:num_labels
\t\tdelta_3(k) = a_3(k) - (y(i) == k);
\tend
\t% step 3
\tdelta_2 = Theta2' * delta_3; % 26 * 1
\tdelta_2 = delta_2(2:end) .* sigmoidGradient(z_2); % 25 * 1
\t% step 4
\tDelta_2 = Delta_2 + delta_3 * a_2';
\tDelta_1 = Delta_1 + delta_2 * a_1';
end
% step 5
Theta1_grad = 1 / m * Delta_1 + lambda/m * Theta1_temp;
Theta2_grad = 1 / m * Delta_2 + lambda/m * Theta2_temp;
% Unroll gradients
gradient = [Theta1_grad(:) ; Theta2_grad(:)];
end

sigmoid函數求導

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

sigmoidGradient.m實現sigmoid函數求導:

%% Compute the gradient of the sigmoid function
function g = sigmoidGradient(z)
g = zeros(size(z));
g = sigmoid(z) .* (1 - sigmoid(z));
end

隨機初始化

訓練神經網絡的時候,初始化權重是非常重要的,用於打破對稱性。

randInitializeWeights.m用來給θ進行初始化權重:

%% Randomly initialize weights
function W = randInitializeWeights(L_in, L_out)
init_epsilon = 0.12;
W = zeros(L_out, L_in + 1);
W = rand(L_out, L_in + 1) * (2 * init_epsilon) - init_epsilon;
end

向後傳播

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

step1:計算a_1, z_2, a_2, z_3, a_3

step2:計算輸出層誤差

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

step3:計算隱藏層誤差

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

step4:計算累計誤差,注意移除δ0

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

step5:計算代價函數的梯度

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

Delta_1 = zeros(size(Theta1)); % 25 * 401
Delta_2 = zeros(size(Theta2)); % 10 * 26
for i = 1:m
\t% step 1
\ta_1 = X(i, :)'; % 401 * 1
\tz_2 = Theta1 * a_1; % 25 * 1
\ta_2 = sigmoid(z_2); % 25 * 1
\ta_2 = [1; a_2]; % 26 * 1
\tz_3 = Theta2 * a_2; % 10 * 1
\ta_3 = sigmoid(z_3); % 10 * 1
\t% step 2
\tdelta_3 = zeros(num_labels, 1); % 10 * 1
\tfor k = 1:num_labels
\t\tdelta_3(k) = a_3(k) - (y(i) == k);
\tend
\t% step 3
\tdelta_2 = Theta2' * delta_3; % 26 * 1
\tdelta_2 = delta_2(2:end) .* sigmoidGradient(z_2); % 25 * 1
\t% step 4
\tDelta_2 = Delta_2 + delta_3 * a_2';
\tDelta_1 = Delta_1 + delta_2 * a_1';
end
% step 5
Theta1_grad = 1 / m * Delta_1 + lambda/m * Theta1_temp;
Theta2_grad = 1 / m * Delta_2 + lambda/m * Theta2_temp;
% Unroll gradients
gradient = [Theta1_grad(:) ; Theta2_grad(:)];

梯度檢驗

由於神經網絡模型中的反向傳播算法較為複雜,在小細節非常容易出錯,從而無法得到最優解,故引入梯度檢驗。

梯度檢驗採用數值估算(Numerical estimation)梯度的方法,被用於驗證反向傳播算法的正確性。

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

其中,ϵ 為極小值,由於太小時容易出現數值運算問題,一般取 10−4

機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

因為數值估計的梯度檢驗效率低,所以如果驗證沒有問題後,在運行訓模型前,一定要把梯度檢驗關閉掉。

checkNNGradients.m梯度檢驗:

%% Function to help check your gradients 

function checkNNGradients(lambda)
%Creates a small neural network to check the backpropagation gradients
if ~exist('lambda', 'var') || isempty(lambda)
lambda = 0;
end
input_layer_size = 3;
hidden_layer_size = 5;
num_labels = 3;
m = 5;

Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);
X = debugInitializeWeights(m, input_layer_size - 1);
y = 1 + mod(1:m, num_labels)';
nn_params = [Theta1(:) ; Theta2(:)];
costFunc = @(p) nnCostFunction(X, y, p, lambda, input_layer_size, hidden_layer_size, num_labels);
[J, gradient] = costFunc(nn_params);
numgradient = computeNumerialGradient(costFunc, nn_params);
disp([numgradient gradient]);
diff = norm(numgradient - gradient) / norm(numgradient + gradient);
fprintf(['If your backpropagation implementation is correct, then \\n' ...
'the relative difference will be small (less than 1e-9). \\n' ...
'\\nRelative Difference: %g\\n'], diff);

end

模型訓練

%% =========== 9.Training NN =============
fprintf('\\nTraining NN... \\n');
options = optimset('MaxIter', 50);
lambda = 1;
costFunction = @(p)nnCostFunction(X, y, p, lambda, input_layer_size, hidden_layer_size, num_labels);
[nn_params, J] = fmincg(costFunction, initial_nn_params, options);
Theta1 = reshape(nn_params(1: hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params(1 + (hidden_layer_size * (input_layer_size + 1)):end), num_labels, (hidden_layer_size + 1));
fprintf('Program paused. Press enter to continue.\\n');
pause;

隱藏層可視化

將隱藏層展示出來:

%% =========== 10.Visualize Weights =============
fprintf('\\nVisualize Weights... \\n');

displayData(Theta1(:, 2:end));
fprintf('Program paused. Press enter to continue.\\n');
pause;
機器學習之反向傳播神經網絡(BP神經網絡):手寫數字識別

發現隱藏層大致對應輸入層的筆劃。

預測

訓練數據的準確率95.6%

%% =========== 11.Implement Predict =============
fprintf('\\nImplement Predict... \\n');
pred = predict(Theta1, Theta2, X);

fprintf('\\n Training set Accuracy: %f', mean(double(pred == y)) * 100);


分享到:


相關文章: