深度学习|faster rcnn 中的 region proposal函数(proposal_layer、anchor_target_layer、proposal_target_layer)

代码地址:https://github.com/endernewton/tf-faster-rcnn
1、首先,用初始化卷积核(9*2和9*4个)对vgg16生成的feature map(512通道)做卷积,得到没有实际意义[1]的rpn_cls_score和rpn_bbox_pred,之后再根据相应的loss反向传播,更新卷积核。rpn_cls_score是判断框是前景/背景,rpn_bbox_pred是预测bounding box和ground truth之间的偏移量delta。利用softmax函数把rpn_cls_score归一化,得到框属于前景/背景的概率,使前景/背景的概率总和等于1。feature map上面每个特征点有9个框,每个框有相应的背景/前景概率,所以rpn_cls_prob的shape是(1,?,?,18)。rpn_cls_pred通过argmax比较前景和背景的概率,判断框属于前景/背景。
RPN和RCNN共享这部分参数。

# 基础CNN网络(VGG16,ZF等)的参数使用ImageNet预训练,其他layer的参数使用期望为0、标准差为0.01的高斯分布初始化[6] def _region_proposal(self, net_conv, is_training, initializer): rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3") self._act_summaries.append(rpn) rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score') # change it so that the score has 2 as its channel size rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape') rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape") # 判断框属于前景/背景 rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred") rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob") # proposal_layer中,计算bbox_transform_inv_tf时,偏移量delta rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')

2、进入proposal_layer,把前景框对应的概率取出来(后面9个),
# Get the scores and bounding boxes 后面9个是前景框 scores = rpn_cls_prob[:, :, :, num_anchors:]

通过bbox_transform_inv函数,对anchor做一些微调(中心点平移和宽高放缩),clip_boxes保证得到的anchor都在图片内。
# 微调anchor,得到proposal proposals = bbox_transform_inv_tf(anchors, rpn_bbox_pred) proposals = clip_boxes_tf(proposals, im_info[:2])

然后利用非极大值抑制(non_max_suppression)[2],按照score(没有实际意义的rpn_cls_score)从高到低选择post_nms_topN个框,就是把与最高score对应的框的重合度大于nms_thresh的框全部删掉,只留下score最高的框。
# 一共600*1000/(16*16)个特征点,在每个特征点上产生9个anchors,大概有20000个anchor,用nms选出2000个 # Non-maximal suppression indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topN, iou_threshold=nms_thresh)

人骑马那张图,人和马不是同一个特征,他们的框的IOU小于0.3;如果两个不同特征的框IOU大于0.7,则删掉。返回这些框(roi, region of interest,2000个)和对应的score。
3、进入anchor_target_layer,只考虑完全在图像内的框。之后利用bbox_overlaps在anchors和gt_boxes之间,两两比较计算,生成一个[N,K]矩阵overlaps表示它们的重叠度,再分别得到anchor和gt_boxes在相应维度最大值处的索引。
# overlaps between the anchors and the gt boxes # overlaps (ex, gt)anchors和gt_boxes之间,两两比较计算,生成一个[N,K]矩阵overlaps,大概20000个anchor和20个类 overlaps = bbox_overlaps( np.ascontiguousarray(anchors, dtype=np.float), np.ascontiguousarray(gt_boxes, dtype=np.float)) argmax_overlaps = overlaps.argmax(axis=1) #得到anchor与ground truth相交面积最大值的索引 max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] #得到anchor与所有ground truth的最大相交面积

把重叠度的最大值和大于RPN_POSITIVE_OVERLAP(0.7)的框的rpn_label设置成1(前景),把重叠度小于RPN_NEGATIVE_OVERLAP(0.3)的框的rpn_label设置成0(背景),不管框内物体的类别,计算交叉熵时让这些rpn_labels和proposal_layer产生的score相乘[3]。
# 计算交叉熵时会使用这些labels # fg label: above threshold IOU labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1if cfg.TRAIN.RPN_CLOBBER_POSITIVES: # assign bg labels last so that negative labels can clobber positives labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

rpn_cross_entropy = tf.reduce_mean( tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))

有128个前景
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) #0.5*256=128

根据与anchor交集最大的gt_boxes,即anchor所属的类,用bbox_transform计算anchor平移和放缩的幅度rpn_bbox_targets,计算这些targets与rpn_bbox_pred的smooth_L1_loss。
# 计算anchor平移和放缩的幅度target,和所属的类比较 bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])

rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])

bbox_inside_weights表示只对前景框做回归,bbox_outside_weights是平衡损失函数的权重。
4、在上一步更新完rpn_label之后,进入proposal_target_layer,从2000个rois中选出128个rois
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images# 每张图片有128个rois fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)#每张图片有32个前景

sample_rois是生成一些由前景和背景组成的框roi,假设有3个ground truth,那么gt_boxes的shape应该是(3,5),关注框的类别 0,1,2 ...等等[5]。
labels = gt_boxes[gt_assignment, 4] # 找到rois属于哪一类# Select foreground RoIs as those with >= FG_THRESH overlap fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0] #前景重叠比例大于0.5 # Guard against the case when an image has fewer than fg_rois_per_image # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) & (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] # 0.1<背景重叠<0.5

get_bbox_regression_labels只对前景框做回归。
# 对属于前景的bounding box做回归 def _get_bbox_regression_labels(bbox_target_data, num_classes): """Bounding-box regression targets (bbox_target_data) are stored in a compact form N x (class, tx, ty, tw, th)This function expands those targets into the 4-of-4*K representation used by the network (i.e. only one class has non-zero targets). one-hot编码[4]Returns: bbox_target (ndarray): N x 4K blob of regression targets bbox_inside_weights (ndarray): N x 4K blob of loss weights """clss = bbox_target_data[:, 0] bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32) bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32) inds = np.where(clss > 0)[0] #背景label全是0 for ind in inds: cls = clss[ind] start = int(4 * cls) end = start + 4 bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS return bbox_targets, bbox_inside_weights


在Pycharm里面,用Ctrl+Shift+F能快速查找函数返回值在哪里会被用到。Ctrl+鼠标左键,直接跳转到函数。
参考:
[1]. https://blog.csdn.net/weixin_40489988/article/details/106181969
[2]. https://www.cnblogs.com/makefile/p/nms.html
[3]. https://blog.csdn.net/ZJRN1027/article/details/80199248
[4]. https://www.cnblogs.com/shuaishuaidefeizhu/p/11269257.html
[5]. https://blog.csdn.net/Mr_health/article/details/84952190
【深度学习|faster rcnn 中的 region proposal函数(proposal_layer、anchor_target_layer、proposal_target_layer)】[6]. https://zhuanlan.zhihu.com/p/54443471

    推荐阅读