多传感器融合|nuScenes与nuScenes devkit简介 python|目标检测|人工智能|深度

nuScenes数据集及nuScenes开发工具包简介
文章目录

nuScenes数据集及nuScenes开发工具包简介
- 1.1. nuScenes数据集简介：
- 1.2 数据采集：
- - 1.2.1 传感器布置
  - 1.2.2 数据格式
- 1.2 数据标注简介
- 1.3 devkit开发工具包简介

学习背景：项目需要仿照nuScenes数据集格式创建基于其他目标的数据集进行训练，因此学习并记录nuScenes数据集的学习路程（待补充），如果有不对的地方欢迎补充留言，大家一起学习，有问题可以相互交流。
**
1.1. nuScenes数据集简介： nuImages是一个具有图像级2d注释的大型自动驾驶数据集。它基于scale标注工具进行标记。
nuScenes相比于其他的数据集，例如kitti apollo scape等，增加了radar（毫米波雷达）传感器，对于传感器的对比，可参考Lidar vs Radar vs Camera。radar的引入，给自动驾驶系统提供了在恶劣条件下相机与激光雷达传感器失灵的解决方案，同时其具有良好的性价比。
nuScenes的主要特点:
1.完善的传感器配置：一个激光雷达，五个毫米波雷达，六个相机，IMU，GPS。
2.数据充足：1000个场景来自于不同的城市，特征复杂：引入了例如可见度信息等丰富了图像的特征信息可以用于其他任务，同时具有庞大的标注对象：1.1B的雷达点以及手工标注的32中分类信息。
1.2 数据采集： 1.2.1 传感器布置
【多传感器融合|nuScenes与nuScenes devkit简介】

文章图片

图1.1车辆传感器位置图车辆传感器布置如上图所示。对于传感器的数据融合和配准，必须要做的一步就是对传感器的校准，其中包括对相机内外参数的校准以及对雷达等传感器的外参校准。

相机外参校准：使用立方形的校准目标放置到相机于雷达前进行校准（具体方法请参考nuScenes官网）（立方体校准清参考一篇论文https://www.researchgate.net/publication/327516843）
相机内参校准：使用带有图案的平板校准（常用的平面棋盘校准法）
毫米波雷达外参校准：将雷达放置在车辆的水平面上，然后在都市环境中驾驶，将所收集的雷达点中的动态物体过滤，然后校准yaw轴的角度值以最小化静态物体的补偿距离变化率。
激光雷达校准：使用laser liner精准地测量雷达到车辆自身坐标系的距离。
完成以上步骤，可以进一步计算雷达与相机的坐标转换矩阵，以开始下一步的数据采集工作。

1.2.2 数据格式
nuScenes对于数据相比于其他数据集有着更加全面的标注，使用并且建立自己的数据集，参考作者的格式定义是个很好的方法。
数据集中基本定义:
log是所收集到的数据的日志信息；scene是20s的视频数据；sample是scene中某个时间戳代表的一帧图像；instance是某一帧图像中所观察到的所有目标实例；sample_annotation我们特意选出的已进行标注过的目标实例。token：数据集中对所有的内容进行编码，包括对数据集对象、传感器、场景、关键帧等进行token的赋值，每个token都是独特的编码。

文章图片

图1.2数据集架构（自己标注的，有点乱）

attribute：对于实例的属性描述，例如同一个目标类别在不同状态下的属性描述:一辆标注的车辆停车、移动或者描述某个自行车是否有骑手。

attribute {"token":> -- Unique record identifier. "name":> -- Attribute name. "description":> -- Attribute description. }

calibrated_sensor：描述一个传感器在车辆上安置的外参和内参矩阵等信息，所有的外参都是相对于车辆自身的坐标系。

calibrated_sensor {"token": -- Unique record identifier. "sensor_token": -- Foreign key pointing to the sensor type. "translation": [3] -- Coordinate system origin in meters: x, y, z. "rotation": [4] -- Coordinate system orientation as quaternion: w, x, y, z. "camera_intrinsic": [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras. }

category：描述目标的种类信息，如果是某个大类的子类，在后面加‘.’进行子类的选择：例如vehicle.door

category {"token":> -- Unique record identifier. "name":> -- Category name. Subcategories indicated by period. "description":> -- Category description. "index": -- The index of the label used for efficiency reasons in the .bin label files of nuScenes-lidarseg. This field did not exist previously. }

ego_pose：在某个特定的时间，车辆的姿态表示，这个姿态表示是相对于世界坐标系的，这个信息是基于雷达成像地图的定位算法所提供的（详情看nuScenes论文中关于自身定位的算法），输出为二维的坐标（x, y）。

ego_pose {"token":> -- Unique record identifier. "translation": [3] -- Coordinate system origin in meters: x, y, z. Note that z is always 0. "rotation": [4] -- Coordinate system orientation as quaternion: w, x, y, z. "timestamp": -- Unix time stamp. }

instance：一个对象实例，例如特定的车辆。是作者观察到的所有对象实例的枚举。注意，实例不是跨场景跟踪的，在一个scene中，instance是连续追踪的（例如：在一个视频中出现的同一辆车会连续追踪并标注）。但是在不同的scene中，instance是无关联的。

instance {"token":> -- Unique record identifier. "category_token":> -- Foreign key pointing to the object category. "nbr_annotations": -- 某个实例在一个scene中被标注的次数 "first_annotation_token":> -- Foreign key. Points to the first annotation of this instance. "last_annotation_token":> -- Foreign key. Points to the last annotation of this instance. }

lidarseg：将annatation和sample_data对应到关键帧的雷达点云数据中

lidarseg {"token":> -- Unique record identifier. "filename":> -- .bin格式的雷达标注文件名称，以uint8的数组数据类型，以二进制格式进行存储） "sample_data_token":> -- Foreign key. Sample_data corresponding to the annotated lidar pointcloud with is_key_frame=True. }

log：对于提取出数据的日志文件

log {"token":> -- Unique record identifier. "logfile":> -- Log file name. "vehicle":> -- Vehicle name. "date_captured":> -- Date (YYYY-MM-DD). "location":> -- Area where log was captured, e.g. singapore-onenorth. }

map：地图数据（自上而下的视角）以二进制语义掩码的格式存储

map {"token":> -- Unique record identifier. "log_tokens":> [n] -- Foreign keys. "category":> -- Map category, currently only semantic_prior for drivable surface and sidewalk. "filename":> -- Relative path to the file with the map mask. }

sample：sample是每隔0.5s采集一次的经过标注的关键帧，其中数据基本是在同一时间戳下采集的作为单个雷达采集循环的一部分。

sample {"token":> -- Unique record identifier. "timestamp": -- Unix time stamp. "scene_token":> -- Foreign key pointing to the scene. "next":> -- Foreign key. Sample that follows this in time. Empty if end of scene. "prev":> -- Foreign key. Sample that precedes this in time. Empty if start of scene. }

sample_annotation：用于标注某个目标在一个sample中的方向等信息的三维标注框，其中所有的定位信息都是基于世界坐标系而定的最终坐标。

sample_annotation {"token":> -- Unique record identifier. "sample_token":> -- Foreign key. 说明来自哪个sample "instance_token":> -- Foreign key. 指向某个instance，因为一个实例可以有很多次标注 "attribute_tokens":> [n] -- Foreign keys. 这次标注中对象的属性，因为一个目标的属性在不同时间一直在改变所以目标的属性归属于此处管理，而不是归于实例 "visibility_token":> -- Foreign key 目标的可见性特征，目标的可见性会一直会改变。 "translation": [3] -- 标注框的中心坐标值 "size": [3] -- 标注框的大小 "rotation": [4] --标注框的方向四元数 "num_lidar_pts": -- 一个雷达扫描期间在标注框内的雷达点 "num_radar_pts": -- Number of radar points in this box. Points are counted during the radar sweep identified with this sample. This number is summed across all radar sensors without any invalid point filtering. "next":> -- Foreign key. 同一个目标定的下一个sample_anatation "prev":> -- Foreign key. Sample annotation from the same object instance that precedes this in time. Empty if this is the first annotation for this object. }

sample_data：传感器返回的数据：例如雷达点云或者是图片。对于sample_data且其is_key_frame = true的，在时间上非常接近sample，对于值为false的sample_data其指向它临近的sample。

sample_data {"token":> -- Unique record identifier. "sample_token":> -- Foreign key. 指向sample_data所关联的sample "ego_pose_token":> -- Foreign key. "calibrated_sensor_token": > -- Foreign key. "filename":> -- Relative path to data-blob on disk. "fileformat":> -- Data file format. #如果数据是图片，以下内容生效 "width": -- If the sample data is an image, this is the image width in pixels. "height": -- If the sample data is an image, this is the image height in pixels. "timestamp": -- Unix time stamp. "is_key_frame": -- True if sample_data is part of key_frame, else False. "next":> -- Foreign key. 来自同一传感器的在下一时刻的数据，如果是scene的末尾，赋值为空。 "prev":> -- Foreign key. Sample data from the same sensor that precedes this in time. Empty if start of scene. }

scene：来自日志文件中一个20s的连续帧，多个帧可以同出自于一个log，实例标记不会跨场景保存。

scene {"token":> -- Unique record identifier. "name":> -- Short string identifier. "description":> -- 例如，某一辆车正在某条路上靠右行驶等描述性词汇 "log_token":> -- Foreign key. 指向某个log "nbr_samples": -- 场景中的sample数量 "first_sample_token":> -- Foreign key. 场景中的第一个sample. "last_sample_token":> -- Foreign key. Points to the last sample in scene. }

sensor：传感器类型描述

sensor {"token":> -- Unique record identifier. "channel":> -- Sensor channel name. "modality":> { camera, lidar, radar} -- Sensor modality. Supports category(ies) in brackets. }

visibility：实例的可见性

visibility {"token":> -- Unique record identifier. "level":> -- Visibility level. "description":> -- Description of visibility level. }

1.2 数据标注简介作者使用nuScenes anacator进行数据的标记。
在收集完数据后，作者对采集的视频进行2hz的采样，并且使用scale进行标注，最后实现了高度精确的标注。对于所有数据集中的对象，作者都进行了语义标注，并且每个对象出现的每个场景中的每一帧都进行了3D框标注和属性注释。这使此数据集相比于2D的数据集拥有更精准推理目标方向和角度的能力。
在雷达点云方面，作者将每个雷达点都进行了语义标注，除了对于23个前景目标的标注，还有对于9个背景目标的标注。
以上为简介，具体标注方法，请看作者公布的标注细节。（待更新）
1.3 devkit开发工具包简介时间紧张，还没写，后期会补充，放下其他人链接(点击此处)
最后，因为编者刚入门多传感器融合相关领域，看了许多天相关文献，没有找到对于这个数据集比较好的翻译材料，就自己尝试地翻译＋口述，最终目的是于大家交流自己的i心得，本篇博文肯定会有一些错误，麻烦大家指出。