機器學習之邏輯迴歸

機器學習之邏輯迴歸

  • 簡述
  • 繪圖
  • sigmoid函數
  • 預測函數
  • 多項式特徵映射
  • 代價函數及梯度下降(無正則化)
  • 代價函數及梯度下降(正則化)
  • 決策邊界繪製
  • 線性擬合可視化(無正則化)
  • 多項式擬合可視化(正則化)

簡述

在分類問題中,預測的結果是離散值(結果是否屬於某一類),邏輯迴歸算法(Logistic Regression)被常用於解決以下分類問題:

  • 垃圾郵件判斷
  • 金融欺詐判斷
  • 腫瘤診斷
  • 等等

本文使用Octave進行學習。

機器學習之邏輯迴歸

繪圖

plotData.m 畫出分類中正類與負類樣本點

%% Function to plot 2D classification data
function plotData(X, y)
figure; hold on;
pos = find(y == 1); neg = find(y == 0);
plot(X(pos, 1), X(pos, 2), 'k+', 'LineWidth', 2, 'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', 'MarkerSize', 7);
hold off;
endfunction

sigmoid函數

sigmoid.m 函數將所有實數映射到(0, 1)範圍。

機器學習之邏輯迴歸

機器學習之邏輯迴歸

%% Sigmoid Function
function g = sigmoid (z)
g = zeros(size(z));
g = 1 ./ (1 + exp(-z));
endfunction

預測函數

predict.m 判斷若g >= 0.5 則為正類1,反之為負類0:

%% Logistic Regression Prediction Function
function p = predict(X, theta)
m = size(X, 1);
p = zeros(m, 1);
g = sigmoid(X * theta);
k = find(g >= 0.5);
p(k) = 1;
endfunction

多項式特徵映射

mapFeature.m 用於生成多項式特徵值,如將2個特徵生成更多的特徵:

%% Function to generate polynomial features
%% use X1, X2 and Returns a new feature array with more features, comprising of X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
function out = mapFeature (X1, X2)
degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
\tfor j = 0:i
\t\tout(:, end+1) = (X1.^(i-j)) .* (X2.^j);
\tend
end
endfunction

代價函數及梯度下降(無正則化)

機器學習之邏輯迴歸

機器學習之邏輯迴歸

機器學習之邏輯迴歸

機器學習之邏輯迴歸

costFunction.m 計算代價函數及梯度下降偏導,不含正則項(罰項) ,可用於最優化函數 fminunc,以求解最優化問題(注意可以不用求α了):

%% Logistic Regression Cost Function
function [J, gradient] = costFunction(X, y, theta)
m = length(y);
J = 0;
gradient = zeros(size(theta));

J = -1 * sum(y .* log(sigmoid(X * theta)) + (1 - y) .* log(1 - sigmoid(X * theta))) / m;
gradient = (X' * (sigmoid(X * theta) - y)) / m;
endfunction

代價函數及梯度下降(正則化)

機器學習之邏輯迴歸

機器學習之邏輯迴歸

機器學習之邏輯迴歸

機器學習之邏輯迴歸

costFunctionReg.m 計算代價函數及梯度下降偏導,含正則項(罰項),可用於最優化函數 fminunc,以求解最優化問題(注意可以不用求α了):

%% Regularized Logistic Regression Cost
function [J, gradient] = costFunctionReg (X, y, theta, lambda)
m = length(y);
J = 0;
gradient = zeros(size(theta));
theta_1 = [0; theta(2:end)]; %theta(1) 不參與正則化,所以取零
J = -1 * sum(y .* log(sigmoid(X * theta)) + (1 - y) .* log(1 - sigmoid(X * theta))) / m + lambda/(2 * m) * theta_1' * theta_1;
gradient = (X' * (sigmoid(X * theta) - y)) / m + lambda / m * theta_1;
endfunction

應用正則化的正規方程法:

  • λ⋅L: 正則化項
  • L: 第一行第一列為 0 00 的 n+1 n+1n+1 維單位矩陣
機器學習之邏輯迴歸

決策邊界繪製

plotDecisionBoundary.m 畫出分類邊界,下面代碼中if裡面為邏輯迴歸模型中繪製線性擬合的決策邊界, else裡面為繪製多項式擬合的決策邊界:

%% Function to plot classifier’s decision boundary
function plotDecisionBoundary(X, y, theta)
plotData(X(:, 2:3), y);
hold on;
if size(X, 2) <= 3
\tplot_x = [min(X(:, 2)) - 2, max(X(:, 2)) + 2];
\tplot_y = (theta(1) + theta(2) .* plot_x) * (-1 ./ theta(3));
\tplot(plot_x, plot_y);
\tlegend('Admitted', 'Not admitted', 'Decision Boundary');
\taxis([30, 100, 30, 100]);
else
\t% Here is the grid range
\tu = linspace(-1, 1.5, 50);
\tv = linspace(-1, 1.5, 50);
\tz = zeros(length(u), length(v));
\tfor i = 1:length(u)
\t\tfor j = 1:length(v)
\t\t\tz(i,j) = mapFeature(u(i), v(j)) * theta;
\t\tend
\tend
\tz = z';
\tcontour(u, v, z, [0,0], 'LineWidth', 2);
\thold on;
\ttitle('lambda = 1')
\t% Labels and Legend
\txlabel('Microchip Test 1')
\tylabel('Microchip Test 2')
\tlegend('y = 1', 'y = 0', 'Decision boundary')
\thold off
\tpause;
\tfigure;

\tsurf(u, v, z);
\txlabel('u')
\tylabel('v')
\tzlabel('z');
end
endfunction

線性擬合可視化(無正則化)

綜合上述所有函數對數據進行可視化操作,以下為邏輯迴歸中線性擬合。

導入數據ex2data1.txt ,(https://github.com/peedeep/Coursera/blob/master/ex2/ex2data1.txt)複製此鏈接處可下載。

繪製正負類樣本點:

%% Machine Learning Online Class - Exercise 2: Logistic Regression
clear; close all; clc
%% ==================== 1.Plotting ====================
data = load('ex2data1.txt');
X = data(:, [1, 2]);
y = data(:, 3);
plotData(X, y)
xlabel('Exam 1 score')
ylabel('Exam 2 score')
legend('Admitted', 'Not admitted');
hold off;
fprintf('\\nProgram paused. Press enter to continue.\\n');
pause;

機器學習之邏輯迴歸

計算代價函數及梯度下降:

%% ============ 2.Compute Cost and Gradient ============
[m, n] = size(X);
X = [ones(m, 1) X];
initial_theta = zeros(n + 1, 1);
[J, gradient] = costFunction(X, y, initial_theta);
fprintf('Cost at initial theta (zeros): %f\\n', J);
fprintf('Expected J (approx): 0.693\\n');
fprintf('Gradient at initial theta (zeros): \\n');
fprintf(' %f \\n', gradient);
fprintf('Expected gradients (approx):\\n -0.1000\\n -12.0092\\n -11.2628\\n');
[J, gradient] = costFunction(X, y, [-24; 0.2; 0.2]);
fprintf('\\nCost at test theta: %f\\n', J);
fprintf('Expected J (approx): 0.218\\n');
fprintf('Gradient at test theta: \\n');
fprintf(' %f \\n', gradient);
fprintf('Expected gradients (approx):\\n 0.043\\n 2.566\\n 2.647\\n');
fprintf('\\nProgram paused. Press enter to continue.\\n');
pause;

使用最優化函數 fminunc,以求解最優化問題並得到最優化後的theta值:

%% ============ 3.Optimizing using fminunc ============
options = optimset('GradObj', 'on', 'MaxIter', 400);
[theta, J, exitFlag] = fminunc(@(t)costFunction(X, y, t), initial_theta, options);
% Print theta to screen
fprintf('exitFlag: %f\\n', exitFlag);
fprintf('J at theta found by fminunc: %f\\n', J);
fprintf('Expected J (approx): 0.203\\n');
fprintf('theta: \\n');
fprintf(' %f \\n', theta);
fprintf('Expected theta (approx):\\n');
fprintf(' -25.161\\n 0.206\\n 0.201\\n');

畫出決策邊界

plotDecisionBoundary(X, y, theta);
hold on;
xlabel('Exam 1 score')
ylabel('Exam 2 score')
legend('Admitted', 'Not admitted')
hold off;
fprintf('\\nProgram paused. Press enter to continue.\\n');
pause;
機器學習之邏輯迴歸

輸入特徵進行預測及計算訓練集準確率

%% ============ 4.Predict and Accuracies ============
prob = sigmoid([1 45 85] * theta);
fprintf(['For a student with scores 45 and 85, we predict an admission probability of %f\\n'], prob);

fprintf('Expected value: 0.775 +/- 0.002\\n\\n');
p = predict(X, theta);
fprintf('p: %d\\n', p);
fprintf('Train Accuracy: %f\\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (approx): 89.0\\n');
fprintf('\\n')

多項式擬合可視化(正則化)

用到數據ex2data2.txt (https://github.com/iheber/Coursera/blob/master/ex2/ex2data2.txt)複製此鏈接可下載

繪製正負類樣本點:

%% Octave/MATLAB/>%% Machine Learning Online Class - Exercise 2: Logistic Regression
clear; close all; clc;
data = load('ex2data2.txt');
X = data(:, [1, 2]);
y = data(:, 3);
plotData(X, y);
hold on;
xlabel('Microchip Test 1');
ylabel('Microchip Test 2');
legend('y = 1', 'y = 0');
hold off;

機器學習之邏輯迴歸

計算正則化後代價函數及梯度下降:

%% =========== 1.Regularized Logistic Regression ============
X = mapFeature(X(:,1), X(:,2));
initail_theta = zeros(size(X, 2), 1);
lambda = 1;
[J, gradient] = costFunctionReg(X, y, initail_theta, lambda);
fprintf('J at initial theta (zeros): %f\\n', J);
fprintf('Expected J (approx): 0.693\\n');
fprintf('Gradient at initial theta (zeros) - first five values only:\\n');
fprintf(' %f \\n', gradient(1:5));
fprintf('Expected gradients (approx) - first five values only:\\n');
fprintf(' 0.0085\\n 0.0188\\n 0.0001\\n 0.0503\\n 0.0115\\n');
fprintf('\\nProgram paused. Press enter to continue.\\n');
pause;

繪製決策邊界:

%% ============= 2.Regularization and Accuracies =============
initial_theta = zeros(size(X, 2), 1);
lambda = 1;
options = optimset('GradObj', 'on', 'MaxIter', 400);
[theta, J, exitFlag] = fminunc(@(t)costFunctionReg(X, y, t, lambda), initial_theta, options);
plotDecisionBoundary(X, y, theta);
hold on;
title(sprintf('lambda = %g', lambda))
% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
legend('y = 1', 'y = 0', 'Decision boundary')
hold off;
p = predict(X, theta);
fprintf('Accuracy: %f\\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\\n');

  • 正則參數lambda = 1時,訓練集準確率
    accuracy = 83.1%
機器學習之邏輯迴歸

機器學習之邏輯迴歸

  • 正則參數lambda = 0時,等於沒有罰項,無法避免過擬合,訓練集準確率accuracy = 86.4%
機器學習之邏輯迴歸

機器學習之邏輯迴歸

  • 正則參數lambda = 100時,過大,導致模型欠擬合,訓練集準確率accuracy = 60%
機器學習之邏輯迴歸

機器學習之邏輯迴歸


分享到:


相關文章: