pytorch|对于torch.nn.AdaptiveAvgPool2d()自适应平均池化函数的一些理解 cnn|python|图像处理|深度学习

AdaptiveAvgPool2d()介绍 torch.nn.AdaptiveAvgPool2d()接受两个参数，分别为输出特征图的长和宽，其通道数前后不发生变化。
vgg在卷积层和全连接层的交界处使用了torch.nn.AdaptiveAvgPool2d((7,7))
看以下代码：

class AdaptiveAvgPool2d(_AdaptiveAvgPoolNd): """Applies a 2D adaptive average pooling over an input signal composed of several input planes. The output is of size H x W, for any input size. The number of output features is equal to the number of input planes. Args: output_size: the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a ``int``, or ``None`` which means the size will be the same as that of the input. Examples: >>> # target output size of 5x7 >>> m = nn.AdaptiveAvgPool2d((5,7)) >>> input = torch.randn(1, 64, 8, 9) >>> output = m(input) >>> # target output size of 7x7 (square) >>> m = nn.AdaptiveAvgPool2d(7) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) >>> # target output size of 10x7 >>> m = nn.AdaptiveMaxPool2d((None, 7)) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) """ @weak_script_method def forward(self, input): return F.adaptive_avg_pool2d(input, self.output_size)

AdaptiveAvgPool2d()
对由多个输入平面组成的输入信号应用二维自适应平均池化。
对于任何输入大小，图像的长宽输出大小为H x W；输出特征的数量等于输入数量（即通道数）。
其中，output_size代表格式为H x W的图像的目标输出大小。
AdaptiveAvgPool2d((H,W))代表输出长为H，宽为W的图像。

# target output size of 5x7 import torch import torch.nn as nn m = nn.AdaptiveAvgPool2d((5,7)) input = torch.randn(1, 64, 8, 9) output = m(input) output.shape #运行结果：torch.Size([1, 64, 5, 7])

若只输入一个参数即AdaptiveAvgPool2d((H)) 相当于 AdaptiveAvgPool2d((H,H)) 即输出长和宽均为H的图像

# target output size of 7x7 (square) import torch import torch.nn as nn m = nn.AdaptiveAvgPool2d((7)) input = torch.randn(1, 64, 10, 9) output = m(input) output.shape #运行结果：torch.Size([1, 64, 7, 7])

若H或W是None，这意味着大小将与输入相同。

# target output size of 10x7 import torch import torch.nn as nn m = nn.AdaptiveMaxPool2d((None, 7)) input = torch.randn(1, 64, 10, 9) output = m(input) output.shape #运行结果：torch.Size([1, 64, 10, 7])

当然，输出维度H、W也可以大于原始维度，但是这种方法通常效果不佳。

# target output size of 80×60 import torch import torch.nn as nn m = nn.AdaptiveMaxPool2d((80, 60)) input = torch.randn(1, 64, 10, 9) output = m(input) output.shape #运行结果：torch.Size([1, 64, 80, 60])

自己的见解什么时候使用AdaptiveAvgPool2d()？我认为在我们构造模型的时候，AdaptiveAvgPool2d()的位置一般在卷积层和全连接层的交汇处，以便确定输出到Linear层的大小。下图为VGG中AdaptiveAvgPool2d()的使用。

pytorch|对于torch.nn.AdaptiveAvgPool2d()自适应平均池化函数的一些理解

文章图片

AdaptiveAvgPool2d()的参数应该如何选取？ AdaptiveAvgPool2d()中H、W的选取与【我们的图的初始大小（长宽）和池化层的数量有关系】，也就是与【经过多个卷积池化操作后的图像长宽】有关，在实验中我发现在参数H、W 比输入图像的长宽小的情况下效果更好。
【pytorch|对于torch.nn.AdaptiveAvgPool2d()自适应平均池化函数的一些理解】比如使用cifar-10进行训练，开始输入的图像为32×32×3（长×宽×通道数），经过三层卷积（通道数均为64）和池化（默认2×2，,每经过一次池化长宽各缩减为先前的两倍），图像变为（4×4×3），这时要把图像放入全连接层训练之前，我们最好对图像进行AdaptiveAvgPool2d()处理，以便使得全连接层的维度得到方便的输入（因为如果我们改变池化层的数量，长宽也随之改变）。
如果此时我们仍然使用AdaptiveAvgPool2d((7,7))，效果不会太好（7<4）【模型很难复现之前的特征】，而我们使用AdaptiveAvgPool2d((4,4))【特征数不变】、AdaptiveAvgPool2d((2,2))、AdaptiveAvgPool2d((1,1))效果相对较好，具体参数需要我们调试。