Inception v1(GoogleNet): https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43022.pdf
Inception v2(BN): https://arxiv.org/pdf/1502.03167.pdf
tensorflow 代码:https://github.com/conan7882/GoogLeNet-Inception
一、Inception v1(GoogleNet) Motivation:
??2014年之前标准的CNN结构通常将数层卷积层,最大池化层和一个以上的全连接层进行堆叠。而提高网路性能的方式就是增加网络的深度和宽度,宽度即各层神经元的个数,对于卷积操作可以理解为卷积核的个数,但这会带来一些局限:
- 巨大的内存集计算资源的消耗
- 网络参数量的增加,容易造成过拟合
- 梯度消失和爆炸
Inception的结构:
如图1(a)为最初的Inception 模块, 对该模块可以进行如下理解:
- 1 . 采用不同大小的卷积核意味着不同大小的感受野,最后拼接意味着不同尺度特征的融合;
- 2 .之所以卷积核大小采用1、3和5,主要是为了方便对齐。设定卷积步长stride=1之后,只要分别设定pad=0、1、2,那么卷积之后便可以得到相同维度的特征,然后这些特征就可以直接拼接在一起了;
- 3 . 文章说很多地方都表明pooling挺有效,所以Inception里面也嵌入了。
- 4 . 网络越到后面,特征越抽象,而且每个特征所涉及的感受野也更大了,因此随着层数的增加,3x3和5x5卷积的比例也要增加。但是,使用5x5的卷积核仍然会带来巨大的计算量。 为此,文章借鉴NIN2,采用1x1卷积核来进行降维。
文章图片
图 1 Inception 模块示意图
??按照这样的结构来增加网络的深度,虽然可以提升性能,但是还面临计算量大(参数多)的问题。为改善这种现象,GooLeNet借鉴Network-in-Network的思想,使用1x1的卷积核实现降维操作(也间接增加了网络的深度),以此来减小网络的参数量(这里就不对两种结构的参数量进行定量比较了)。1×1的卷积可以理解为learned pooling in depth. 如图1(b)所示。在3×3、5×5卷积前加入1×1卷积降维。而把1×1卷积放在3×3最大池化之后,相比较放在前面,也是为了减少参数量。
作者指出了Inception的优点:
- 显著增加了每一步的单元数目,也就是网络的宽度,计算复杂度不会不受限制,尺度较大的块卷积之前先降维
- 视觉信息在不同尺度上进行处理聚合,这样下一步可以从不同尺度提取特征
对图2做如下说明:
1 . 显然GoogLeNet采用了模块化的结构,方便增添和修改,其实网络结构就是叠加Inception Module.
2 . 网络最后采用了average pooling来代替全连接层,想法来自NIN,事实证明可以将TOP1 accuracy提高0.6%。但是,实际在最后还是加了一个全连接层, 是为了做微调。
3 . 虽然移除了全连接,但是网络中依然使用了Dropout ;
4 . 为了避免梯度消失,网络额外增加了2个辅助的softmax用于向前传导梯度。此外,实际测试的时候,这两个额外的softmax会被去掉。softmax分支除了避免梯度消失的作用,另一个是将中间某一层输出用作分类,起到模型融合的作用。
注:给定深度相对较大的网络,有效传播梯度反向通过所有层的能力是一个问题。在这个任务上,更浅网络的强大性能表明网络中部层产生的特征应该是非常有识别力的。通过将辅助分类器添加到这些中间层,可以期望较低阶段分类器的判别力。这被认为是在提供正则化的同时克服梯度消失问题。这些分类器采用较小卷积网络的形式,放置在Inception (4a)和Inception (4d)模块的输出之上。在训练期间,它们的损失以折扣权重(辅助分类器损失的权重是0.3)加到网络的整个损失上。在推断时,这些辅助网络被丢弃。后面的控制实验表明辅助网络的影响相对较小(约0.5),只需要其中一个就能取得同样的效果。
包括辅助分类器在内的附加网络的具体结构如下:
1)一个滤波器大小5×5,步长为3的平均池化层,导致(4a)阶段的输出为4×4×512,(4d)的输出为4×4×528。
2)具有128个滤波器的1×1卷积,用于降维和修正线性激活。
3)一个全连接层,具有1024个单元和修正线性激活。
4)丢弃70%输出的丢弃层。
5)使用带有softmax损失的线性层作为分类器(作为主分类器预测同样的1000类,但在推断时移除)。
文章图片
图 2 Inception v1 网络结构
文章图片
图 3 Inception v1具体网络参数
二、Inception v2(BN-Inception) google这边对于inception v2是属于哪篇论文有些不同观点, 该系列博客是以下面第二种解释为准:
- 在《Rethinking the Inception Architecture for Computer Vision》中认为:基于inception v1进行结构的改进是inception v2;在inception v2上加上BN是inception v3;
- 在《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》中将《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》认为是inception v2(即inception v1 上进行小改动再加上BN);《Rethinking the Inception Architecture for Computer Vision》认为是inception v3
- 使用了Batch Normalization. BN带来的好处有: 对每一层的输入做了类似标准化处理,能够预防梯度消失和梯度爆炸,加快训练的速度;减少了前面一层参数的变化对后面一层输入值的影响,每一层独立训练,有轻微正则化效果
- 用两个3x3Convolution替代一个5x5Convolution。
- Inception 3模块的数量从原来啊的两个变为三个.
- 在Inception模块内部有的使用Max Pooling 有的使用Max Pooling.
- 两个Inception模块群之间没有明显的池化层,采用步长为2 的卷积代替,例如3c 和4e 层concatenate之前。
文章图片
图 4 Inception v1(BN)网络具体参数
文章图片
图 5 Inception v1(BN) 网络结构 3 Inception v2(BN)代码实现 采用tensorflow2.0, tf.keras编程实现。
"""
Inception v2 models for tensorflow 2.0, tf.keras.
# Reference
- [Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf))"""import os
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import Sequential, layersos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'class BasicCon2D(keras.Model):
"""This is the basic con2d operation, i.e.conv+bn+relu"""
def __init__(self, filter_nums, **kwargs):
super(BasicCon2D, self).__init__()
self.conv = layers.Conv2D(filter_nums, use_bias=False, **kwargs)
self.bn = layers.BatchNormalization()
self.relu = layers.Activation('relu')def call(self, inputs, training=None):
out = self.conv(inputs)
out = self.bn(out)
out = self.relu(out)
return outclass InceptionA(keras.Model):
"""This inception has no downsampling operation and stride=1"""
def __init__(self, n1x1, n3x3_reduce, n3x3, n5x5_reduce, n5x5, pool_proj, pool_way='average'):
super(InceptionA, self).__init__()
# 1×1convolution branch
self.branch1x1 = BasicCon2D(n1x1, kernel_size=(1, 1))
self.branch3x3_1 = BasicCon2D(n3x3_reduce, kernel_size=(1, 1))
self.branch3x3_2 = BasicCon2D(n3x3, kernel_size=(3, 3), padding='same')
self.branch5x5_1 = BasicCon2D(n5x5_reduce, kernel_size=(1, 1))
self.branch5x5_2 = BasicCon2D(n5x5, kernel_size=(3, 3), padding='same')
self.branch5x5_3 = BasicCon2D(n5x5, kernel_size=(3, 3), padding='same')
if pool_way == 'average':
self.branch_pool_1 = layers.AveragePooling2D((3, 3), strides=1, padding='same')
self.branch_pool_2 = BasicCon2D(pool_proj, kernel_size=(1, 1))
elif pool_way == 'max':
self.branch_pool_1 = layers.MaxPool2D((3, 3), strides=1, padding='same')
self.branch_pool_2 = BasicCon2D(pool_proj, kernel_size=(1, 1))
else:
raise ValueError('The pool_way belongs to "average" and "max"')def call(self, inputs, training=None):
# 1x1 convolution branch
branch1x1 = self.branch1x1(inputs)
# 3x3 convolution branch
branch3x3 = self.branch3x3_1(inputs)
branch3x3 = self.branch3x3_2(branch3x3)
# 5x5 convolution branch
branch5x5 = self.branch5x5_1(inputs)
branch5x5 = self.branch5x5_2(branch5x5)
branch5x5 = self.branch5x5_3(branch5x5)
# pool convolution branch
branch_pool = self.branch_pool_1(inputs)
branch_pool = self.branch_pool_2(branch_pool)out = layers.concatenate([branch1x1, branch3x3, branch5x5, branch_pool], axis=-1)
return outclass InceptionB(keras.Model):
"""This inception has downsampling operation and stride=2"""
def __init__(self, n3x3_reduce, n3x3, n5x5_reduce, n5x5):
super(InceptionB, self).__init__()
self.branch3x3_1 = BasicCon2D(n3x3_reduce, kernel_size=(1, 1))
self.branch3x3_2 = BasicCon2D(n3x3, kernel_size=(3, 3), strides=2, padding='same')
self.branch5x5_1 = BasicCon2D(n5x5_reduce, kernel_size=(1, 1))
self.branch5x5_2 = BasicCon2D(n5x5, kernel_size=(3, 3), padding='same')
self.branch5x5_3 = BasicCon2D(n5x5, kernel_size=(3, 3), strides=2, padding='same')
self.branch_pool = layers.MaxPool2D((3, 3), strides=2, padding='same')def call(self, inputs, training=None):
# 3x3 convolution branch
branch3x3 = self.branch3x3_1(inputs)
branch3x3 = self.branch3x3_2(branch3x3)
# 5x5 convolution branch
branch5x5 = self.branch5x5_1(inputs)
branch5x5 = self.branch5x5_2(branch5x5)
branch5x5 = self.branch5x5_3(branch5x5)
# pool branch
branch_pool = self.branch_pool(inputs)out = layers.concatenate([branch3x3, branch5x5, branch_pool], axis=-1)
return outclass Inception2(keras.Model):
"""Applying the InceptionV2 network"""
def __init__(self, num_classes):
super(Inception2, self).__init__()
self.conv1 = BasicCon2D(64, kernel_size=(7, 7), strides=2, padding='same')
self.max_pool1 = layers.MaxPool2D((3, 3), strides=2, padding='same')
self.conv2_1 = BasicCon2D(64, kernel_size=(1, 1))
self.conv2_2 = BasicCon2D(192, kernel_size=(3, 3), strides=1, padding='same')
self.max_pool2 = layers.MaxPool2D((3, 3), strides=2, padding='same')
# inception 3
self.inception3a = InceptionA(64, 64, 64, 64, 96, 32)
self.inception3b = InceptionA(64, 64, 96, 64, 96, 64)
self.inception3c = InceptionB(128, 160, 64, 96)
# inception 4
self.inception4a = InceptionA(224, 64, 96, 96, 128, 128)
self.inception4b = InceptionA(192, 96, 128, 96, 128, 128)
self.inception4c = InceptionA(160, 128, 160, 128, 160, 96)
self.inception4d = InceptionA(96, 128, 192, 160, 192, 96)
self.inception4e = InceptionB(128, 192, 192, 256)
# inception 5
self.inception5a = InceptionA(352, 192, 320, 160, 224, 128)
self.inception5b = InceptionA(352, 192, 320, 192, 224, 128, pool_way='max')
# global average pooling
self.avg_pool = layers.GlobalAveragePooling2D()
self.fc = layers.Dense(num_classes)def call(self, inputs, training=None):
out = self.conv1(inputs)
out = self.max_pool1(out)
out = self.conv2_1(out)
out = self.conv2_2(out)
out = self.max_pool2(out)
out = self.inception3a(out)
out = self.inception3b(out)
out = self.inception3c(out)
out = self.inception4a(out)
out = self.inception4b(out)
out = self.inception4c(out)
out = self.inception4d(out)
out = self.inception4e(out)
out = self.inception5a(out)
out = self.inception5b(out)
out = self.avg_pool(out)
out = self.fc(out)
return outif __name__ == '__main__':
model = Inception2(10)
model.build(input_shape=(None, 224, 224, 3))
model.summary()
print(model.predict(tf.ones((10, 224, 224, 3))).shape)
References 【deep|深度学习经典网络(Inception系列网络(Inception v1 & Inception v2(BN)))】1、卷积神经网络的网络结构——GoogLeNet
2、Inception Network Overview
推荐阅读
- Deep|《动手学深度学习》(二)-- 多层感知机
- Deep|《动手学深度学习》(一)-- 线性神经网络
- Deep|线性分类器学习笔记
- Deep|YoloV5+DeepSort+TensorRT 目标检测、跟踪
- Python|About Evaluation Metrics
- Machine|k-means算法详解
- LC-RNN: A Deep Learning Model for Traffic Speed Prediction
- machine|最大相关 - 最小冗余(mRMR)特征选择
- Deep|深度学习实战案例(预测房价)