深度学习|faster rcnn 中的 region proposal函数(proposal_layer、anchor_target_layer、proposal_target_layer) tensorflow|深度学习

代码地址：https://github.com/endernewton/tf-faster-rcnn
1、首先，用初始化卷积核(9*2和9*4个)对vgg16生成的feature map(512通道)做卷积，得到没有实际意义[1]的rpn_cls_score和rpn_bbox_pred，之后再根据相应的loss反向传播，更新卷积核。rpn_cls_score是判断框是前景/背景，rpn_bbox_pred是预测bounding box和ground truth之间的偏移量delta。利用softmax函数把rpn_cls_score归一化，得到框属于前景/背景的概率，使前景/背景的概率总和等于1。feature map上面每个特征点有9个框，每个框有相应的背景/前景概率，所以rpn_cls_prob的shape是(1,?,?,18)。rpn_cls_pred通过argmax比较前景和背景的概率，判断框属于前景/背景。
RPN和RCNN共享这部分参数。

# 基础CNN网络（VGG16，ZF等）的参数使用ImageNet预训练，其他layer的参数使用期望为0、标准差为0.01的高斯分布初始化[6] def _region_proposal(self, net_conv, is_training, initializer): rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3") self._act_summaries.append(rpn) rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score') # change it so that the score has 2 as its channel size rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape') rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape") # 判断框属于前景/背景 rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred") rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob") # proposal_layer中，计算bbox_transform_inv_tf时，偏移量delta rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')

2、进入proposal_layer，把前景框对应的概率取出来(后面9个)，

# Get the scores and bounding boxes 后面9个是前景框 scores = rpn_cls_prob[:, :, :, num_anchors:]

通过bbox_transform_inv函数，对anchor做一些微调(中心点平移和宽高放缩)，clip_boxes保证得到的anchor都在图片内。

# 微调anchor，得到proposal proposals = bbox_transform_inv_tf(anchors, rpn_bbox_pred) proposals = clip_boxes_tf(proposals, im_info[:2])

然后利用非极大值抑制(non_max_suppression)[2]，按照score(没有实际意义的rpn_cls_score)从高到低选择post_nms_topN个框，就是把与最高score对应的框的重合度大于nms_thresh的框全部删掉，只留下score最高的框。

# 一共600*1000/(16*16)个特征点，在每个特征点上产生9个anchors，大概有20000个anchor，用nms选出2000个 # Non-maximal suppression indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)

人骑马那张图，人和马不是同一个特征，他们的框的IOU小于0.3；如果两个不同特征的框IOU大于0.7，则删掉。返回这些框(roi, region of interest，2000个)和对应的score。
3、进入anchor_target_layer，只考虑完全在图像内的框。之后利用bbox_overlaps在anchors和gt_boxes之间，两两比较计算，生成一个[N,K]矩阵overlaps表示它们的重叠度，再分别得到anchor和gt_boxes在相应维度最大值处的索引。

# overlaps between the anchors and the gt boxes # overlaps (ex, gt)anchors和gt_boxes之间，两两比较计算，生成一个[N,K]矩阵overlaps，大概20000个anchor和20个类 overlaps = bbox_overlaps( np.ascontiguousarray(anchors, dtype=np.float), np.ascontiguousarray(gt_boxes, dtype=np.float)) argmax_overlaps = overlaps.argmax(axis=1) #得到anchor与ground truth相交面积最大值的索引 max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #得到anchor与所有ground truth的最大相交面积

把重叠度的最大值和大于RPN_POSITIVE_OVERLAP(0.7)的框的rpn_label设置成1(前景)，把重叠度小于RPN_NEGATIVE_OVERLAP(0.3)的框的rpn_label设置成0(背景)，不管框内物体的类别，计算交叉熵时让这些rpn_labels和proposal_layer产生的score相乘[3]。

# 计算交叉熵时会使用这些labels # fg label: above threshold IOU labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1if cfg.TRAIN.RPN_CLOBBER_POSITIVES: # assign bg labels last so that negative labels can clobber positives labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

rpn_cross_entropy = tf.reduce_mean( tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))

有128个前景

num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) #0.5*256=128

根据与anchor交集最大的gt_boxes，即anchor所属的类，用bbox_transform计算anchor平移和放缩的幅度rpn_bbox_targets，计算这些targets与rpn_bbox_pred的smooth_L1_loss。

# 计算anchor平移和放缩的幅度target，和所属的类比较 bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])

rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])

bbox_inside_weights表示只对前景框做回归，bbox_outside_weights是平衡损失函数的权重。
4、在上一步更新完rpn_label之后，进入proposal_target_layer，从2000个rois中选出128个rois

rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images# 每张图片有128个rois fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)#每张图片有32个前景

sample_rois是生成一些由前景和背景组成的框roi，假设有3个ground truth，那么gt_boxes的shape应该是(3,5)，关注框的类别 0，1，2 ...等等[5]。

labels = gt_boxes[gt_assignment, 4] # 找到rois属于哪一类# Select foreground RoIs as those with >= FG_THRESH overlap fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0] #前景重叠比例大于0.5 # Guard against the case when an image has fewer than fg_rois_per_image # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) & (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] # 0.1<背景重叠<0.5

get_bbox_regression_labels只对前景框做回归。

# 对属于前景的bounding box做回归 def _get_bbox_regression_labels(bbox_target_data, num_classes): """Bounding-box regression targets (bbox_target_data) are stored in a compact form N x (class, tx, ty, tw, th)This function expands those targets into the 4-of-4*K representation used by the network (i.e. only one class has non-zero targets). one-hot编码[4]Returns: bbox_target (ndarray): N x 4K blob of regression targets bbox_inside_weights (ndarray): N x 4K blob of loss weights """clss = bbox_target_data[:, 0] bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32) bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32) inds = np.where(clss > 0)[0] #背景label全是0 for ind in inds: cls = clss[ind] start = int(4 * cls) end = start + 4 bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS return bbox_targets, bbox_inside_weights

在Pycharm里面，用Ctrl+Shift+F能快速查找函数返回值在哪里会被用到。Ctrl+鼠标左键，直接跳转到函数。
参考：
[1]. https://blog.csdn.net/weixin_40489988/article/details/106181969
[2]. https://www.cnblogs.com/makefile/p/nms.html
[3]. https://blog.csdn.net/ZJRN1027/article/details/80199248
[4]. https://www.cnblogs.com/shuaishuaidefeizhu/p/11269257.html
[5]. https://blog.csdn.net/Mr_health/article/details/84952190
【深度学习|faster rcnn 中的 region proposal函数(proposal_layer、anchor_target_layer、proposal_target_layer)】[6]. https://zhuanlan.zhihu.com/p/54443471