程式設計師1小時完成深度學習Resnet,谷歌tensorflow多次圖像大賽冠軍

閱前須知:

為了使本文結構精簡,理解簡單,所以會盡量少涉及到有關數學公式,降低學習門檻,帶領讀者快速搭建ResNet-34經典模型並投入訓練。

編譯環境:Python3.6

TensorFlow-gpu 1.5.0

Pycharm

數 據 集:MNIST

一、結構分析

關於ResNet的來源我就不進行贅述了,相信讀者都對其或多或少有一定的瞭解,這個包攬各大圖像識別賽事冠軍的模型究竟有什麼與眾不同?

程序員1小時完成深度學習Resnet,谷歌tensorflow多次圖像大賽冠軍

圖片來源Google

說起卷積模型,LeNet、Inception、Vgg都是我們在學習圖像識別領域神經網絡的經典模型,以上圖片模型就是經典的Vgg-19與Plain-34、ResNet-34的對比。

從計算量上來講,Vgg-19的三層全連接神經網絡的計算量明顯大於plain和resnet,plain和resnet的參數數量相同

程序員1小時完成深度學習Resnet,谷歌tensorflow多次圖像大賽冠軍

圖片來源Google

從訓練擬合度上講,論文中分別給出了plain-18、plain-34和resnet-18、resnet-34的對比,我們不難發現plain隨著層數的增加,精度並沒有得到明顯的提升,而resnet不僅隨著層數的增加提高了訓練精度,且相較同深度的plain而言精度更高

在以往的學習之中,我們知道深度網絡隨著層數的增加,很容易造成“退化”和“梯度消失”的問題,訓練數據的過擬合。但在ResNet中,作者給出了一種解決方案:增加一個identity mapping(恆等映射,由於本文面向讀者基礎不同,就不加以詳述,有能力的同學可以看一下ResNet作者的論文)

程序員1小時完成深度學習Resnet,谷歌tensorflow多次圖像大賽冠軍

圖片來源Google

上圖是一個殘差模塊的結構示意,殘差塊想要有效果需要有兩層或兩層以上的layer,同時,輸入x與輸出F(x)的維度也須相同

程序員1小時完成深度學習Resnet,谷歌tensorflow多次圖像大賽冠軍

圖片來源Google

在對於高於50層深度的resnet模型中,為了進一步減少計算量且保證模型精度,作者對殘差塊進行了優化,將內部兩層3*3layer換成1*1 → 3*3 → 1*1,。首先採用1*1卷積進行深度降維,減少殘差模塊在深度上的計算量,第二層3*3layer和之前的模塊功能一樣,提取圖像特徵,第三層1*1layer用於維度還原。

那麼問題又來了,既然已經經過了3*3卷積,那輸出維度怎麼會一樣呢?作者在論文中給出了三種解決方案:

1、維度不足部分全0填充

2、輸入輸出維度一致時使用恆等映射,不一致時使用線性投影

3、對於所有的block均使用線性投影。

在本文中,我們對模型主要採用全0填充。

好,以上就是簡單的理論入門,接下來我們開始著手用TensorFlow對理論進行代碼實現

二、代碼實現(ResNet-34)

參數設定(DATA_set.py)

NUM_LABELS = 10 #對比標籤數量(模型輸出通道)

#卷積參數

CONV_SIZE = 3

CONV_DEEP = 64

#學習優化參數

BATCH_SIZE = 100

LEARNING_RATE_BASE = 0.03

LEARNING_RATE_DECAY = 0.99

REGULARIZATION_RATE = 0.0001

TRAINING_STEPS = 8000

MOVING_AVERAGE_DECAY = 0.99

#圖片信息

IMAGE_SIZE = 28

IMAGE_COLOR_DEPH = 1

#模型保存位置

MODEL_SAVE_PATH="MNIST_model/"

MODEL_NAME="mnist_model"

#日誌路徑

LOG_PATH = "log"

定義傳遞規則(ResNet_infernece.py)

import tensorflow as tf

import tensorflow.contrib.slim as slim

import DATA_set.py

#雙層殘差模塊

def res_layer2d(input_tensor,

kshape = [5,5],

deph = 64,

conv_stride = 1,

padding='SAME'):

data = input_tensor

#模塊內部第一層卷積

data = slim.conv2d(data,

num_outputs=deph,

kernel_size=kshape,

stride=conv_stride,

padding=padding)

#模塊內部第二層卷積

data = slim.conv2d(data,

num_outputs=deph,

kernel_size=kshape,

stride=conv_stride,

padding=padding,

activation_fn=None)

output_deep = input_tensor.get_shape().as_list()[3]

#當輸出深度和輸入深度不相同時,進行對輸入深度的全零填充

if output_deep != deph:

input_tensor = tf.pad(input_tensor,

[[0, 0], [0, 0], [0, 0],

[abs(deph-output_deep)//2,

abs(deph-output_deep)//2] ])

data = tf.add(data,input_tensor)

data = tf.nn.relu(data)

return data

#模型在增加深度的同時,為了減少計算量進行的xy軸降維(下采樣),

#這裡用卷積1*1,步長為2。當然也可以用max_pool進行下采樣,效果是一樣的

def get_half(input_tensor,deph):

data = input_tensor

data = slim.conv2d(data,deph//2,1,stride = 2)

return data

#組合同類殘差模塊

def res_block(input_tensor,

kshape,deph,layer = 0,

half = False,name = None):

data = input_tensor

with tf.variable_scope(name):

if half:

data = get_half(data,deph//2)

for i in range(layer//2):

data = res_layer2d(input_tensor = data,deph = deph,kshape = kshape)

return data

#定義模型傳遞流程

def inference(input_tensor, train = False, regularizer = None):

with slim.arg_scope([slim.conv2d,slim.max_pool2d],stride = 1,padding = 'SAME'):

with tf.variable_scope("layer1-initconv"):

data = slim.conv2d(input_tensor,DATA_set.CONV_DEEP , [7, 7])

data = slim.max_pool2d(data,[2,2],stride=2)

with tf.variable_scope("resnet_layer"):

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP,layer = 6,half = False,

name = "layer4-9-conv")

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP * 2,layer = 8,half = True,

name = "layer10-15-conv")

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP * 4,layer = 12,half = True,

name = "layer16-27-conv")

data = res_block(input_tensor = data,kshape = [DATA_set.CONV_SIZE,

DATA_set.CONV_SIZE],

deph = DATA_set.CONV_DEEP * 8,layer = 6,half = True,

name = "layer28-33-conv")

data = slim.avg_pool2d(data,[2,2],stride=2)

#得到輸出信息的維度,用於全連接層的輸入

data_shape = data.get_shape().as_list()

nodes = data_shape[1] * data_shape[2] * data_shape[3]

reshaped = tf.reshape(data, [data_shape[0], nodes])

#最後全連接層

with tf.variable_scope('layer34-fc'):

fc_weights = tf.get_variable("weight", [nodes, DATA_set.NUM_LABELS],

initializer=tf.truncated_normal_initializer(stddev=0.1))

if regularizer != None: tf.add_to_collection('losses', regularizer(fc_weights))

fc_biases = tf.get_variable("bias", [DATA_set.NUM_LABELS],

initializer=tf.constant_initializer(0.1))

fc = tf.nn.relu(tf.matmul(reshaped, fc_weights) + fc_biases)

if train:

fc = tf.nn.dropout(fc, 0.5)

return fc

模型訓練(MNIST_train.py)

import tensorflowas tf

from tensorflow.examples.tutorials.mnistimport input_data

import ResNet_infernece

import DATA_set.py

import os

import numpy as np

#定義損失函數、學習率、滑動平均操作

#基礎範圍,本文不加以闡釋

def train_op_data(mnist,lables,output,

moving_average_decay,learning_rate_base,

batch_size,learning_rate_daecay,global_step):

variable_averages =

tf.train.ExponentialMovingAverage(moving_average_decay,

global_step)

variables_averages_op = variable_averages.apply(tf.trainable_variables())

cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output,

labels=tf.argmax(lables, 1))

cross_entropy_mean = tf.reduce_mean(cross_entropy)

loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))

learning_rate = tf.train.exponential_decay( learning_rate_base,

global_step,

mnist.train.num_examples / batch_size,

learning_rate_daecay, staircase=True)

train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,

global_step=global_step)

with tf.control_dependencies([train_step, variables_averages_op]):

train_op = tf.no_op(name='train')

return train_op , loss

def train(mnist):

# 定義數據入口和標註

input = tf.placeholder(tf.float32,

[ DATA_set.BATCH_SIZE, DATA_set.IMAGE_SIZE,

DATA_set.IMAGE_SIZE, DATA_set.IMAGE_COLOR_DEPH],

name='input')

y_ = tf.placeholder(tf.float32,

[None, DATA_set.NUM_LABELS],

name='y-input')

regularizer = tf.contrib.layers.l2_regularizer(DATA_set.REGULARIZATION_RATE)

#獲取結果

y = ResNet_infernece.inference(input, False, regularizer)

#全局行為

global_step = tf.Variable(0, trainable=False)

#進行模型優化

train_op , loss = train_op_data(mnist,y_,y,

DATA_set.MOVING_AVERAGE_DECAY,

DATA_set.LEARNING_RATE_BASE,

DATA_set.BATCH_SIZE,

DATA_set.LEARNING_RATE_DECAY,

global_step)

# 初始化TensorFlow持久化類,並對模型進行訓練

saver = tf.train.Saver()

write = tf.summary.FileWriter(DATA_set.LOG_PATH,tf.get_default_graph())

write.close()

with tf.Session() as sess:

tf.global_variables_initializer().run()

for i in range(DATA_set.TRAINING_STEPS):

xs, ys = mnist.train.next_batch(DATA_set.BATCH_SIZE)

reshaped_xs = np.reshape(xs, ( DATA_set. BATCH_SIZE,

DATA_set.IMAGE_SIZE,

DATA_set.IMAGE_SIZE,

DATA_set.IMAGE_COLOR_DEPH))

_, loss_value, step = sess.run([train_op, loss, global_step],

feed_dict={input: reshaped_xs, y_: ys})

print("After %d training step(s), loss on training batch is %g."

% (step, loss_value))

if i % 100 == 0:

saver.save(sess, os.path.join(DATA_set.MODEL_SAVE_PATH,

DATA_set.MODEL_NAME),

global_step=global_step)

def main(argv=None):

mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

train(mnist)

if __name__ == '__main__': tf.app.run()

參考控制檯輸出:

After 1 training step(s), loss on training batch is 2.30636.

After 2 training step(s), loss on training batch is 2.30597.

After 3 training step(s), loss on training batch is 2.30568.

After 4 training step(s), loss on training batch is 2.30372.

After 5 training step(s), loss on training batch is 2.30359.

.

.

.

After 4070 training step(s), loss on training batch is 0.0895806.

After 4071 training step(s), loss on training batch is 0.00559415.

After 4072 training step(s), loss on training batch is 0.0293886.

After 4073 training step(s), loss on training batch is 0.0106446.

After 4074 training step(s), loss on training batch is 0.0337671.

After 4075 training step(s), loss on training batch is 0.0660308.

After 4076 training step(s), loss on training batch is 0.00239097.

After 4077 training step(s), loss on training batch is 0.00584769.

After 4078 training step(s), loss on training batch is 0.0306475.

After 4079 training step(s), loss on training batch is 0.0532619.

.

.

.

想要源碼和作者交流的,請看擴展鏈接。或者關注頭條號,給我們留言。吧

程序員1小時完成深度學習Resnet,谷歌tensorflow多次圖像大賽冠軍


分享到:


相關文章: