ubuntu中ssd实验检测自己的数据 create_list.sh误区 ssd

（1）creat_list.sh的解读
（2）creat_data.sh的注意事项
（3）SSD标注文件的修改
create_list.sh的文件解读可以参考：https://blog.csdn.net/qq_21368481/article/details/82350331

图片截图之后，做数据集的时候，记得装换格式，因为格式不对的话，占用较多内存。

【ubuntu中ssd实验检测自己的数据 create_list.sh误区】我添加了一下内容，可同时参考：

#!/bin/bashroot_dir=$HOME/caffe-ssd/data/CarDataSet #存放数据的跟目录 sub_dir=ImageSets/Main bash_dir="$(cd "$(dirname "${(BASH_SOURCE[0]}")" && pwd)" echo $bash_dir#路径/home/caffe-ssd #create_list.sh的所在目录 for dataset in trainval test#只会查找trainval和test的文件,dataset为 trainval 和test do dst_file=$bash_dir/$dataset.txt#/home/caffe-ssd/trainval.txt if [ -f $dst_file ]#如果已有文件，则移除该文件 then rm -f $dst_file fi for name in car2019#遍历car2019文件夹，有annotation/main/JPEG do #if [[ $dataset == "test" && $name == "VOC2012" ]] #then #continue #fi echo "Create list for $name $dataset..."#for car2019 trainval/test dataset_file=$root_dir/$name/$sub_dir/$dataset.txtimg_file=$bash_dir/$dataset"_img.txt"#中间文件 img_file cp $dataset_file $img_file sed -i "s/^/$name\/JPEGImages\//g" $img_file#-i 直接修改文件名，对img_file 修改 sed -i "s/$/.jpg/g" $img_filelabel_file=$bash_dir/$dataset"_label.txt"#中间文件 .xml 文件 cp $dataset_file $label_file sed -i "s/^/$name\/Annotations\//g" $label_file sed -i "s/$/.xml/g" $label_filepaste -d' ' $img_file $label_file >> $dst_file #作用为：将图片与xml文件合称为一个txt，并保存在caffe的目录下。 rm -f $label_file rm -f $img_file done# Generate image name and size infomation. echo "hello test： $name $dataset..."#for car2019 trainval/test 下面会通过if执行 if [ $dataset == "test" ] then #$HOME/caffe-ssd/build/tools/get_image_size $root_dir $dst_file $bash_dir/$dataset"_name_size.txt" $bash_dir/build/tools/get_image_size $root_dir $dst_file $bash_dir/$dataset"_name_size.txt" fi echo "hello trainval： $name $dataset..." # Shuffle trainval file. if [ $dataset == "trainval" ] then rand_file=$dst_file.random cat $dst_file | perl -MList::Util=shuffle -e 'print shuffle(); ' > $rand_file mv $rand_file $dst_file fi done

我们要在data/carDataSet/car2019/miageSets/Main中放入我们的txt，主要testtraintrainval val .txt的内容，否则在执行create_list.sh的时候会生成很奇怪的文件，create_list.sh 要在caffe-ssd的根目录执行，主要是生成文件test_name_size.txttest.txttrainval.txt，前者记录测试图片的像素，后者为test 与 trainval 与xml文件的合并。
main中的文件生成的python代码为：

import os import randomtrainval_percent = 0.66 train_percent = 0.5 # xmlfilepath = 'Annotations' xmlfilepath = 'D:/trainCar/xml' txtsavepath = 'D:/trainCar/txt' # txtsavepath = 'ImageSets\Main' total_xml = os.listdir(xmlfilepath)num=len(total_xml) list=range(num) tv=int(num*trainval_percent) tr=int(tv*train_percent) trainval= random.sample(list,tv) train=random.sample(trainval,tr)#ftrainval = open('ImageSets/Main/trainval.txt', 'w') ftrainval = open('D:/trainCar/txt/trainval.txt', 'w') ftest = open('D:/trainCar/txt/test.txt', 'w') ftrain = open('D:/trainCar/txt/train.txt', 'w') fval = open('D:/trainCar/txt/val.txt', 'w')for iin list: name=total_xml[i][:-4]+'\n' if i in trainval: ftrainval.write(name) if i in train: ftrain.write(name) else: fval.write(name) else: ftest.write(name)ftrainval.close() ftrain.close() fval.close() ftest .close() print ("done")

其中有一个很大的问题：如果你是用windows生成了txt,然后移到Ubuntu的时候，你需要在Ubuntu的ImageSet/Main/文件的路径下执行下面代码，主要是因为把格式转为unix格式，要不生成test_name_size.txttest.txttrainval.txt都可能出错

dos2unix *.txt

这样会将格式变为linux下能阅读到的（utf-8格式），没有dos2unix的，自己装包吧

create_data.sh、xml文件修改
重要的是，执行的环境必须是在SSD 下的python环境(unbuntu环境)，这个是改变了临时环境的。否则会报错AttributeError:'module'object has no attribute 'Label.Map"

export PYTHONPATH=/home/pengshan/caffe-ssd/python

同时，在执行create_data.sh之前需要把caffe-ssd根目录下create_list.sh的的生成的trainval.txttest.txttest_name_size.txt移动到create_list.sh 的文件下，否则会出现file not found 报错。

但是，因为我的数据集是多人协作的，所以会出现数据报错，比如说会出现标签不一样，例如我的标签是"bus""car""van",则可能出错为：“busa”，“car ”,等，主要是生成xml文件中的的修改

def changeLabel(): path = "D:/photos_xml/bus/xml/" files = os.listdir(path)# 得到文件夹下所有文件名称 for xmlFile in files:# 遍历文件夹 if not os.path.isdir(xmlFile):# 判断是否是文件夹,不是文件夹才打开 #print ("打开xml文件："+xmlFile) pass newStr = os.path.join(path, xmlFile) print ("操作文件---:"+newStr) dom = parse(newStr)###最核心的部分,路径拼接,输入的是具体路径 root = dom.getroot() #root.find('path').text = newStr1 if root.find('object/name').text=="busa": root.find('object/name').text="bus"#修改object节点下name的名字 print ("修改后的text："+root.find('object/name').text) #nodes=root.findall() #for node in nodes: #print (node) dom.write(newStr, xml_declaration=True) # print ("text内容2：" + root.find('object/name').text) pass