机器学习|目标检测之牛仔行头检测（下）—— 以YOLOV5为baseline解决样本不均衡问题。 pytorch|深度学习|python|目标检测

目录

前言
一、验证集重新划分
二、数据重采样
1.Update image weights
?
3.Focal loss
4.更改模型

前言本次主要是基于YOLOV5为baseline来解决样本不均衡的问题，来提高得分。不太了解的可以先看看上一篇文章
【机器学习|目标检测之牛仔行头检测（下）—— 以YOLOV5为baseline解决样本不均衡问题。】目标检测之牛仔行头检测（上）—— 读取coco数据集并转换为yolo数据格式，以YOLOV5为baseline训练并提交结果

一、验证集重新划分我上一次是直接采用了比较暴力的方法，每个类别都分10个给验证集，这很明显是有问题的。这次我按照标签的数目的百分比来进行采样，简单的说就是样本少的比如belt采用30%进入验证集，比较多的样本sunglasses采用0.1%划分如验证集的方法。
实现起来和上次的划分也就几行代码的改进。

import os import cv2 as cv import shutil import numpy as np from tqdm import tqdm from torchvision.datasets.coco import CocoDetection# 检测并创建需要的文件夹 def check_path(save_path): if not os.path.exists(save_path): os.mkdir(save_path) print("making {} file".format(save_path))# convert the bounding box from COCO to YOLO format. def cc2yolo_bbox(img_width, img_height, bbox): dw = 1. / img_width dh = 1. / img_height x = bbox[0] + bbox[2] / 2.0 y = bbox[1] + bbox[3] / 2.0 w = bbox[2] h = bbox[3]x = x * dw w = w * dw y = y * dh h = h * dhreturn (x, y, w, h)# save image and label def save_image_label(image_path_copy_to, label_path_copy_to, label_text): # 保存地址和复制图片 shutil.copyfile(img_path, image_path_copy_to) # 保存val label with open(label_path_copy_to, "w") as f: index = 0 for write in label_text: label_info = [str(write[0]), str(write[1][0]), str(write[1][1]), str(write[1][2]), str(write[1][3])] if index == 0: f.write(" ".join(label_info)) index += 1 else: f.write("\n" + " ".join(label_info))# 以下两个是coco数据集的路径，按需修改！！！！！ coco_image_path = 'D:\CMP\\2021Innovation\dataset\Cowboy\cowboyoutfits\images' coco_json_path = 'D:\CMP\\2021Innovation\dataset\Cowboy\cowboyoutfits\\train.json' # 拼接出保存的目录结构 save_path_root = './yolo_datasets' save_path_train = os.path.join(save_path_root, 'train') save_path_train_images = os.path.join(save_path_train, 'images') save_path_train_labels = os.path.join(save_path_train, 'labels') save_path_val = os.path.join(save_path_root, 'val') save_path_val_images = os.path.join(save_path_val, 'images') save_path_val_labels = os.path.join(save_path_val, 'labels') save_path = [save_path_root, save_path_train, save_path_train_images, save_path_train_labels, save_path_val, save_path_val_images, save_path_val_labels] # 生成保存目录 for path in save_path: check_path(path) # 用于后续统计及划分数据集 categories = {87: 'belt', 1034: 'sunglasses', 131: 'boot', 318: 'cowboy_hat', 588: 'jacket'} categories_name = ['belt', 'sunglasses', 'boot', 'cowboy_hat', 'jacket'] categories_map = {'belt': 0, 'sunglasses': 1, 'boot': 2, 'cowboy_hat': 3, 'jacket': 4}# 类别映射 # 这里面的类别数是提前采样得到的 categories_static = {'belt': round(25*0.3), 'sunglasses': round(2330*0.01), 'boot': round(449*0.1), 'cowboy_hat': round(595*0.1), 'jacket': round(2195*0.01)}# 统计验证集 # 载入数据 train_data = https://www.it610.com/article/CocoDetection(coco_image_path, coco_json_path) # 获取id和filename的对应字典 id_filenames = {} for i in range(len(train_data.coco.dataset['images'])): id = train_data.coco.dataset['images'][i]['id'] file_name = train_data.coco.dataset['images'][i]['file_name'] id_filenames.update({id: file_name})# 统计数据for image, label in tqdm(train_data): # 将PIL图片转为cv image = cv.cvtColor(np.array(image), cv.COLOR_RGB2BGR) # 通过id获取图片名称 image_name = id_filenames[label[0]['image_id']] # 去除.jpg后最 image_name = image_name[:-4] # 获取图片路径 img_path = os.path.join(coco_image_path, image_name + '.jpg') # 获取图片长宽 img_width, img_height = image.shape[1], image.shape[0] # 保存的信息 info = [] for i in range(len(label)): # 读取bbox并转换为yolo格式 bbox = label[i]['bbox'] # 用于display用 # x1, y1, x2, y2 = int(bbox[0]), int(bbox[1]), int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]) bbox = cc2yolo_bbox(img_width, img_height, bbox) # 获取对应label的class名字 class_name = categories[label[i]['category_id']] info.append([categories_map[class_name], bbox])#cv.rectangle(image, (x1, y1), (x2, y2), color=(255, 0, 0), thickness=2) #cv.putText(image, class_name, (x1, y1), cv.FONT_HERSHEY_SIMPLEX, 0.5, color=(0, 0, 255), thickness=2) # cv.imshow('image', image) # cv.waitKey()class_counter = [0, 0, 0, 0, 0] label_text = [] flag = 1 for text in info: class_counter[text[0]] += 1 label_text.append(text) # 是腰带类 if text[0] == 0: class_index = 0 flag = 0 # print(label_text) if flag: class_index = np.argmax(class_counter) # 如果少于10个，就加入测试集 if categories_static[categories_name[class_index]]: # 对应类别数加一 categories_static[categories_name[class_index]] -= 1 # 保存到val image_path_copy_to = os.path.join(save_path_val_images, image_name + '.jpg') label_path_copy_to = os.path.join(save_path_val_labels, image_name + '.txt') save_image_label(image_path_copy_to, label_path_copy_to, label_text) # 否则加入训练集 else: image_path_copy_to = os.path.join(save_path_train_images, image_name + '.jpg') label_path_copy_to = os.path.join(save_path_train_labels, image_name + '.txt') save_image_label(image_path_copy_to, label_path_copy_to, label_text)

二、数据重采样这个数据重采样简单的说有两个方便，也是沐神提供的，发现效果确实十分好。
一个是在采集batch数据的时候，对于小数量的类别的，采样的频率高一点，对于大数量的类别，采样频率低一点。这个在直觉上也是十分有效的，因为对于样本量大的类别，属于比较容易学习的类别，不需要重复的去学习，而对于小类别的，则需要多次的重复学习。
二是在设置权重的时候，对于小数量的类别，学习的权重大一点，对于大数量的，学习权重小一点。
二者其实是等价的，而这个实现起来也比较简单，YOLOV5其实也有内置这个功能。

1.Update image weights 这个具体实现方法，我以代码为主来讲解
一个是根据label去得到每个类别的学习权重，如下

# 主要通过这个函数，去统计类别，改变权重或者采样频率 def labels_to_class_weights(labels, nc=80): # Get class weights (inverse frequency) from training labels # 如果没有labels，则直接返回空 if labels[0] is None:# no labels loaded return torch.Tensor() # labes原先是（N，1，5），通过concatenate拉长，方便处理 labels = np.concatenate(labels, 0)# labels.shape = (866643, 5) for COCO classes = labels[:, 0].astype(np.int)# labels = [class xywh] # 根据class得出的每个class的数量索引 weights = np.bincount(classes, minlength=nc)# occurrences per class# 没有出现过的label置为1 weights[weights == 0] = 1# replace empty bins with 1 weights = 1 / weights# number of targets per class weights /= weights.sum()# normalize正则化 return torch.from_numpy(weights)

weights处理完后，可以看到对于belt：0的权重是设置的比较大的。

机器学习|目标检测之牛仔行头检测（下）—— 以YOLOV5为baseline解决样本不均衡问题。

文章图片

第二个是根据label去得到每个类别的采样频率，如下

def labels_to_image_weights(labels, nc=80, class_weights=np.ones(80)): # 每个label中的class数量 class_counts = np.array([np.bincount(x[:, 0].astype(np.int), minlength=nc) for x in labels]) # 根据class_weights再去修正图片的采样 image_weights = (class_weights.reshape(1, nc) * class_counts).sum(1)return image_weights

文章图片

可以看到对于较少样本的采样频率是比较高的。
总和

# 通过统计label，更改class_weights model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) * nc# attach class weightsmaps = np.zeros(nc)# mAP per class # 根据map的值去重新计算cw cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc# class weights # 根据计算出来的cw再次计算iw iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw)# image weights dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n)# rand weighted idx

这个采样频率主要就是通过random.choices来实现的,主要实现原理如下，不过我们也可以看到一个问题就是，如果一个样本过于小，就像belt一样，得到权重过大，会得到全部样本都是belt，效果也会不好，所以可以对计算出来的iw做一个平展处理，使得整体的差距不会太大。我自己处理的方法是设定一个上限和下限，再进行归一化。（如果有更好的方法，欢迎交流）

import numpy as np import random iw = [0.91542, 0.00639, 0.04197, 0.02941, 0.00681] b = [1, 2, 3, 4, 5] c = random.choices(b, weights=iw, k=5) print(c) """ [1, 1, 1, 1, 1] """ # 防止数据过于不平衡 iw = np.clip(iw, 0.2, 0.8) sum = np.sum(iw) iw = [x/sum for x in iw] c = random.choices(b, weights=iw, k=5) print(c) """ [1, 2, 1, 5, 1] """

上面的方法要在yolov5中实现也十分简单，最新版本的作者已经帮我们实现好了
我们只需要在train.py中将image-weights设置为True即可

文章图片

同样是采取yolov5s，跑50个epoch，可以看出比不优化效果要好不少。得分也是涨了15点

Epochgpu_memboxobjclslabelsimg_size 49/494.52G0.021430.01552 0.000569844640: 100%|██████████| 182/182 [09:25<00:00,3.11s/it] ClassImagesLabelsPRmAP@.5 mAP@.5:.95: 100%|██████████| 5/5 [00:09<00:00,1.90s/it] all1582890.7930.5590.5990.358 belt1589100.1280.072 sunglasses158370.5360.8110.7280.436 boot1581000.8420.480.6460.367 cowboy_hat158970.9180.8090.850.484 jacket158460.6670.6960.6460.43

文章图片
3.Focal loss 这里我不对focal loss做过多解读，因为感觉自己对focal loss不知道理解的对不对。虽然说focal loss是用来解决类别不均衡的问题，但在目标检测领域，作者是这么说的。
The Focal Loss is designed to address the one-stage object detection scenario in which there is an extreme imbalance between foreground and background classes during training (e.g., 1:1000)。
也就是说，在目标检测领域focal loss主要解决的是前景和背景样本不均衡的问题，即是anchor box中背景过多，positvie的太少，是解决这个问题的。所以focal loss对于我们这个比赛实际上用处不大。
我自己实际测试过，使用focal loss并没有很好的结果，反而让结果变差了。
4.更改模型因为我自己baseline用的是yolov5s，如果采用m或者l，效果应该会好不少，大家有空的也可以去试试。
后续如果想到或者看到什么比较好的方法，自己也会回来在实验一下