当前位置：首页 > news >正文

网页制作与网站建设实战大全pdf淘宝客建站教程

news 2025/12/17 0:33:07

网页制作与网站建设实战大全pdf,淘宝客建站教程,个人网站如何赚钱,山西正规网站建设推广一、概述 3D LiDAR目标检测是一种在三维空间中识别和定位感兴趣目标的技术。在自动驾驶系统和先进的空间分析中#xff0c;目标检测方法的不断演进至关重要。3D LiDAR目标检测作为一种变革性的技术#xff0c;在环境感知方面提供了前所未有的准确性和深度信息. 在这里…一、概述 3D LiDAR目标检测是一种在三维空间中识别和定位感兴趣目标的技术。在自动驾驶系统和先进的空间分析中目标检测方法的不断演进至关重要。3D LiDAR目标检测作为一种变革性的技术在环境感知方面提供了前所未有的准确性和深度信息. 在这里我们将深入探讨使用关键点特征金字塔网络K-FPN结合KITTI 360 Vision数据集融合RGB相机和3D LiDAR数据实现自动驾驶的详细过程和训练方法。二、3D 点云中的目标检测 3D目标检测的核心在于识别和定位三维空间中的物体。与仅考虑图像平面上的高度和宽度的2D检测不同3D检测还融入了深度信息从而提供了完整的空间理解。这对于自动驾驶、机器人技术和增强现实等应用至关重要因为这些领域与环境的交互是三维的. 2.1 人类深度感知 3D目标检测背后的基本直觉源于人类感知深度的方式。人类视觉利用阴影、透视和视差等线索推断第三维。类似地3D检测算法利用几何形状、阴影和点的相对运动来辨别深度。 2.2 数字深度感知 Shuiwang Ji等人在他们的研究论文《3D Convolutional Neural Networks for Human Action Recognition》中首次提出了3D - CNN的概念。他们的模型能够通过执行3D卷积从空间和时间维度提取特征从而捕捉多个相邻帧中编码的运动信息。这个特定模型从输入帧生成多个信息通道最终的特征表示结合了所有通道的信息。 3D环境的表示通常通过点云实现点云是三维坐标系中的顶点集合。这些顶点通常来自结构光或激光雷达LiDAR传感器。3D检测的一个关键方面是将这些点云转换为可处理的格式以便识别目标。这涉及到分割即将点云划分为可能代表目标的簇然后将这些簇分类为已知类别如汽车、行人或其他感兴趣的目标。这里的技术挑战很大因为点云数据具有稀疏性和可变性。与2D图像中的像素不同3D空间中的点分布不均匀并且其密度会随与传感器的距离而变化。诸如PointNet及其后续版本如PointNet等复杂算法可以直接处理点云学习对排列不变且对遮挡和杂乱具有鲁棒性的特征。 2.3 3D点云环境中目标检测的特殊性在3D点云环境中检测目标引入了传统2D目标检测中不存在的几个特殊特征深度估计最显著的特征之一是深度估计它允许确定目标与传感器的距离。在点云中直接测量深度而在2D图像中则必须推断深度。体积估计算法可以利用数据的体积性质考虑目标的实际形状和大小。这与2D边界框不同2D边界框仅近似目标在图像平面中的占位面积。6DoF六个自由度目标姿态3D检测算法不仅定位目标还确定其在空间中的方向提供完整的6DoF姿态估计三个用于位置三个用于旋转。尺度不变性检测过程可以对目标的尺度不变。这对于基于LiDAR的系统尤为重要因为目标可能出现在不同距离因此具有不同尺度。动态环境中的时间连续性先进的3D目标检测系统利用动态环境中的时间连续性。通过跟踪点云数据随时间的变化它们可以预测移动目标的轨迹和速度。三、论文综述 3.1 VoxelNet Yin Zhou和Oncel Tuzel提出了VoxelNet——一种基于点云的3D目标检测的端到端学习方法。VoxelNet创新地将点云划分为结构化的3D体素网格并采用独特的体素特征编码层将每个体素内的点转换为全面的特征表示。该表示与区域建议网络RPN无缝集成以生成目标检测结果。在KITTI汽车检测基准测试中VoxelNet显著优于现有的基于LiDAR的检测方法并展示了学习不同目标表示的卓越能力其在检测行人和自行车方面也取得了有前景的结果。 3.2 BirdNet Jorge Beltrán等人引入了BirdNet——一个基于LiDAR信息的3D目标检测框架。他们的方法首先对激光数据的鸟瞰图投影进行创新的单元编码然后使用从图像处理技术改编的卷积神经网络估计目标位置和方向。最后阶段涉及后处理以巩固3D定向检测。在KITTI数据集上进行验证时他们的框架不仅在该领域设定了新标准还在不同LiDAR系统中表现出通用性证实了其在现实交通条件下的稳健性。 3.3 VirConvNet Hai Wu等人提出了VirConvNet这是一种新颖且高效的骨干网络旨在提高检测性能同时管理计算负载。VirConvNet的核心是两个创新组件StVD随机体素丢弃它有策略地减少冗余体素计算NRConv抗噪子流形卷积它通过利用2D和3D数据稳健地编码体素特征。作者展示了他们管道的三个变体VirConv - L用于效率VirConv - T用于精度VirConv - S用于半监督方法。令人印象深刻的是他们的管道在KITTI汽车3D检测排行榜上取得了顶级排名VirConv - S领先VirConv - L具有快速推理时间。 Peixuan Li等人开发了一种新颖的单目3D检测框架能够进行高效且准确的单次预测。他们的方法摆脱了对传统2D边界框约束的依赖创新性地从单目图像预测3D边界框的九个关键点利用几何关系准确推断3D空间中的尺寸、位置和方向。即使在有噪声的关键点估计情况下这种方法也被证明是稳健的其紧凑的架构有助于实现快速检测速度。值得注意的是他们的训练方案不需要外部网络依赖或监督数据。该框架成为第一个用于单目图像3D检测的实时系统在KITTI数据集上设定了新的性能基准。四、用于3D LiDAR目标检测的数据集可视化 4.1 KITTI 360 Vision数据集在这里将使用KITTI 360 Vision数据集进行训练过程。这是一个相对较大的数据集因此需要进行3D LiDAR可视化以进行探索性数据分析EDA过程。以下是该实验的一些可视化结果。可视化突出了来自传感器的3D LiDAR数据的三维表示。然而在RGB相机流上可视化3D边界框也很重要这对于开发先进驾驶辅助系统ADAS至关重要。为此您必须首先下载数据集并创建目录结构。以下是KITTI 360 Vision数据集特定文件的链接 Velodyne点云 - 激光信息29GB对象数据集的训练标签5MB对象数据集的相机校准矩阵16MB对象数据集的左彩色图像12GB - 用于可视化现在安排文件使目录结构如下所示 kitti ├── demo| └── calib.txt ├── gt_database ├── gt_database_mm ├── ImageSets ├── train.txt| ├── test.txt| └── valid.txt ├── training ├── image_2 ├── label_2 ├── calib └── velodyne ├── testing ├── image_2 ├── calib └── velodyne ├── kitti_dbinfos_train.pkl ├── kitti_dbinfos_train_mm.pkl ├── kitti_infos_train.pkl ├── kitti_infos_trainval.pkl ├── kitti_infos_val.pkl └── kitti_infos_test.pkl 花点时间探索代码库中kitti_dataset.py文件里定义的KittiDataset类中的方法。可以通过滚动到本研究文章的代码演练部分或点击此处下载代码。这个KittiDataset类是一个自定义数据集类适用于加载和操作来自KITTI 360 Vision数据集的数据。这个数据集类针对不同的操作模式如训练train、验证val和测试test进行了定制并通过configs参数进行配置该参数包含目录路径、输入大小和类别数量等设置。这是在data_process目录中的kitti_dataset.py脚本中实现的。以下是类方法及其功能的细分 def __init__(self, configs, modetrain, lidar_augNone, hflip_probNone, num_samplesNone):self.dataset_dir configs.dataset_dirself.input_size configs.input_sizeself.hm_size configs.hm_sizeself.num_classes configs.num_classesself.max_objects configs.max_objectsassert mode in [train, val, test], Invalid mode: {}.format(mode)self.mode modeself.is_test (self.mode test)sub_folder testing if self.is_test else trainingself.lidar_aug lidar_augself.hflip_prob hflip_probself.image_dir os.path.join(self.dataset_dir, sub_folder, image_2)self.lidar_dir os.path.join(self.dataset_dir, sub_folder, velodyne)self.calib_dir os.path.join(self.dataset_dir, sub_folder, calib)self.label_dir os.path.join(self.dataset_dir, sub_folder, label_2)split_txt_path os.path.join(self.dataset_dir, ImageSets, {}.txt.format(mode))self.sample_id_list [int(x.strip()) for x in open(split_txt_path).readlines()]if num_samples is not None:self.sample_id_list self.sample_id_list[:num_samples]self.num_samples len(self.sample_id_list)这个初始化方法通过初始化各种数据目录图像、LiDAR、校准和标签的路径来设置数据集并根据操作模式创建要使用的样本ID列表。它可以可选地应用LiDAR数据增强lidar_aug和水平翻转hflip_prob进行数据增强。如果指定了num_samples数据集将相应地截断其长度。 def __len__(self):return len(self.sample_id_list)此方法返回数据集中的样本数量允许PyTorch的DataLoader正确迭代数据集。 def __getitem__(self, index):if self.is_test:return self.load_img_only(index)else:return self.load_img_with_targets(index)此方法从数据集中检索单个数据点。如果模式为“test”它调用load_img_only仅检索图像数据。对于“train”或“val”它调用load_img_with_targets以获取图像数据和相应的目标标签。 def load_img_only(self, index):Load only image for the testing phasesample_id int(self.sample_id_list[index])img_path, img_rgb self.get_image(sample_id)lidarData self.get_lidar(sample_id)lidarData get_filtered_lidar(lidarData, cnf.boundary)bev_map makeBEVMap(lidarData, cnf.boundary)bev_map torch.from_numpy(bev_map)metadatas {img_path: img_path,}return metadatas, bev_map, img_rgb此方法在测试阶段用于仅加载图像数据及其相关元数据因为测试期间不使用标签。 def load_img_with_targets(self, index):Load images and targets for the training and validation phasesample_id int(self.sample_id_list[index])img_path os.path.join(self.image_dir, {:06d}.png.format(sample_id))lidarData self.get_lidar(sample_id)calib self.get_calib(sample_id)labels, has_labels self.get_label(sample_id)if has_labels:labels[:, 1:] transformation.camera_to_lidar_box(labels[:, 1:], calib.V2C, calib.R0, calib.P2)if self.lidar_aug:lidarData, labels[:, 1:] self.lidar_aug(lidarData, labels[:, 1:])lidarData, labels get_filtered_lidar(lidarData, cnf.boundary, labels)bev_map makeBEVMap(lidarData, cnf.boundary)bev_map torch.from_numpy(bev_map)hflipped Falseif np.random.random() self.hflip_prob:hflipped True# C, H, Wbev_map torch.flip(bev_map, [-1])targets self.build_targets(labels, hflipped)metadatas {img_path: img_path,hflipped: hflipped}return metadatas, bev_map, targets此方法加载用于训练或验证的图像和目标标签。它应用任何指定的LiDAR增强并在需要时处理翻转鸟瞰图BEV映射。它还构建用于目标检测的目标包括热图、中心偏移、尺寸和方向。 def get_image(self, idx):img_path os.path.join(self.image_dir, {:06d}.png.format(idx))img cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB)return img_path, img此方法获取图像文件路径并使用OpenCV加载它将其从BGR转换为RGB格式。 def get_calib(self, idx):calib_file os.path.join(self.calib_dir, {:06d}.txt.format(idx))# assert os.path.isfile(calib_file)return Calibration(calib_file)此方法检索指定索引的校准数据这对于在相机和LiDAR坐标系之间进行转换至关重要。 def get_lidar(self, idx):lidar_file os.path.join(self.lidar_dir, {:06d}.bin.format(idx))# assert os.path.isfile(lidar_file)return np.fromfile(lidar_file, dtypenp.float32).reshape(-1, 4)它从二进制文件加载原始LiDAR数据并将其重塑为N x 4的NumPy数组其中N是点数4表示x、y、z坐标和反射强度。 def get_label(self, idx):labels []label_path os.path.join(self.label_dir, {:06d}.txt.format(idx))for line in open(label_path, r):line line.rstrip()line_parts line.split( )obj_name line_parts[0] # Car, Pedestrian,...cat_id int(cnf.CLASS_NAME_TO_ID[obj_name])if cat_id -99: # ignore Tram and Misccontinuetruncated int(float(line_parts[1])) # truncated pixel ratio [0..1]occluded int(line_parts[2]) # 0visible, 1partly occluded, 2fully occluded, 3unknownalpha float(line_parts[3]) # object observation angle [-pi..pi]# xmin, ymin, xmax, ymaxbbox np.array([float(line_parts[4]), float(line_parts[5]), float(line_parts[6]), float(line_parts[7])])# height, width, length (h, w, l)h, w, l float(line_parts[8]), float(line_parts[9]), float(line_parts[10])# location (x,y,z) in camera coord.x, y, z float(line_parts[11]), float(line_parts[12]), float(line_parts[13])ry float(line_parts[14]) # yaw angle (around Y-axis in camera coordinates) [-pi..pi]object_label [cat_id, x, y, z, h, w, l, ry]labels.append(object_label)if len(labels) 0:labels np.zeros((1, 8), dtypenp.float32)has_labels Falseelse:labels np.array(labels, dtypenp.float32)has_labels Truereturn labels, has_labels此方法从标签文件读取对象标签包括对象类型、尺寸和方向等属性。 def build_targets(self, labels, hflipped):minX cnf.boundary[minX]maxX cnf.boundary[maxX]minY cnf.boundary[minY]maxY cnf.boundary[maxY]minZ cnf.boundary[minZ]maxZ cnf.boundary[maxZ]num_objects min(len(labels), self.max_objects)hm_l, hm_w self.hm_sizehm_main_center np.zeros((self.num_classes, hm_l, hm_w), dtypenp.float32)cen_offset np.zeros((self.max_objects, 2), dtypenp.float32)direction np.zeros((self.max_objects, 2), dtypenp.float32)z_coor np.zeros((self.max_objects, 1), dtypenp.float32)dimension np.zeros((self.max_objects, 3), dtypenp.float32)indices_center np.zeros((self.max_objects), dtypenp.int64)obj_mask np.zeros((self.max_objects), dtypenp.uint8)for k in range(num_objects):cls_id, x, y, z, h, w, l, yaw labels[k]cls_id int(cls_id)# Invert yaw angleyaw -yawif not ((minX x maxX) and (minY y maxY) and (minZ z maxZ)):continueif (h 0) or (w 0) or (l 0):continuebbox_l l / cnf.bound_size_x * hm_lbbox_w w / cnf.bound_size_y * hm_wradius compute_radius((math.ceil(bbox_l), math.ceil(bbox_w)))radius max(0, int(radius))center_y (x - minX) / cnf.bound_size_x * hm_l # x -- y (invert to 2D image space)center_x (y - minY) / cnf.bound_size_y * hm_w # y -- xcenter np.array([center_x, center_y], dtypenp.float32)if hflipped:center[0] hm_w - center[0] - 1center_int center.astype(np.int32)if cls_id 0:ignore_ids [_ for _ in range(self.num_classes)] if cls_id -1 else [-cls_id - 2]# Consider to make mask ignorefor cls_ig in ignore_ids:gen_hm_radius(hm_main_center[cls_ig], center_int, radius)hm_main_center[ignore_ids, center_int[1], center_int[0]] 0.9999continue# Generate heatmaps for main centergen_hm_radius(hm_main_center[cls_id], center, radius)# Index of the centerindices_center[k] center_int[1] * hm_w center_int[0]# targets for center offsetcen_offset[k] center - center_int# targets for dimensiondimension[k, 0] hdimension[k, 1] wdimension[k, 2] l# targets for directiondirection[k, 0] math.sin(float(yaw)) # imdirection[k, 1] math.cos(float(yaw)) # re# im -- -imif hflipped:direction[k, 0] -direction[k, 0]# targets for depthz_coor[k] z - minZ# Generate object masksobj_mask[k] 1targets {hm_cen: hm_main_center,cen_offset: cen_offset,direction: direction,z_coor: z_coor,dim: dimension,indices_center: indices_center,obj_mask: obj_mask,}return targets基于处理后的标签和增强信息此方法构建用于训练模型的目标变量。这些包括对象中心的热图、中心点的偏移、对象尺寸、方向向量和指示对象存在的掩码。 def draw_img_with_label(self, index):sample_id int(self.sample_id_list[index])img_path, img_rgb self.get_image(sample_id)lidarData self.get_lidar(sample_id)calib self.get_calib(sample_id)labels, has_labels self.get_label(sample_id)if has_labels:labels[:, 1:] transformation.camera_to_lidar_box(labels[:, 1:], calib.V2C, calib.R0, calib.P2)if self.lidar_aug:lidarData, labels[:, 1:] self.lidar_aug(lidarData, labels[:, 1:])lidarData, labels get_filtered_lidar(lidarData, cnf.boundary, labels)bev_map makeBEVMap(lidarData, cnf.boundary)return bev_map, labels, img_rgb, img_path最后这个实用函数用于在BEV图上叠加标签以进行可视化这对于理解数据和调试数据集类特别有用。 4.2 RGB POV相机和3D BEV LiDAR点云模拟的分析以下是由上一节中所示的KittiDataset类生成的一些可视化结果。上面图像上半部分显示了道路场景的标准POV相机视图而下半部分显示了来自3D LiDAR数据的相应鸟瞰图BEV。让我们仔细看看并分析这个可视化 RGB POV相机视图街道视图中的对象被封闭在3D边界框中表示对象在三维空间中的空间范围长度、宽度和高度。3D BEV LiDAR视图底部图像表示由LiDAR点构建的BEV图。在BEV图中世界从俯视角度查看LiDAR数据投影到二维平面上。这种投影有助于理解对象之间的空间布局和关系而不会受到相机图像的透视失真影响。BEV中的红色边界框对应于相机视图中的3D边界框注释平铺到2D平面上。它显示了检测到的对象相对于车辆位置通常位于同心弧的中心的位置和方向。同心弧表示与3D LiDAR传感器的距离间隔。它们给出了点云中的点和对象的尺度和距离感。五、关键点特征金字塔网络架构关键点特征金字塔网络KFPN如Peixuan Li等人在RTM3D研究论文中详细描述的那样为3D目标检测提供了一种复杂而细致的方法特别是在自动驾驶场景中。这个网络架构专门用于处理从3D LiDAR点云编码的鸟瞰图BEV并输出具有七个自由度7 - DOF的详细目标检测结果。 5.1 关键技术骨干网络使用ResNet - 18和DLA - 34骨干网络进行初始图像处理应用了下采样因子为[此处可能是文档中缺失的下采样因子具体值]以提高计算效率。上采样和特征连接采用一系列双线性插值和[此处可能是文档中缺失的卷积相关内容]卷积通过在每个上采样阶段连接相应的低级特征图来丰富特征表示。关键点特征金字塔采用一种新颖的方法进行尺度不变的关键点检测将每个尺度的特征图调整为最大尺度以进行一致的关键点分析。检测头由基本组件和可选组件组合而成包括用于3D边界框主中心和顶点检测的热图。关键点关联回归局部偏移以进行关键点分组并采用多箱方法进行精确的偏航角估计提高3D LiDAR目标检测的准确性。 5.2 骨干网络 KFPN利用两个不同的结构作为其骨干网络ResNet - 18和DLA - 34。这些骨干网络负责对单个RGB输入图像表示为[此处可能是文档中缺失的图像表示相关内容]进行初始处理。图像经过下采样因子为[此处可能是文档中缺失的下采样因子具体值]的下采样与图像分类网络中的标准做法一致其中最大下采样因子为×32。骨干网络在特征提取和降低计算复杂性方面起着至关重要的作用。 5.3 上采样和特征连接在初始下采样之后网络采用一系列上采样层。这个过程涉及三个双线性插值与[此处可能是文档中缺失的卷积相关内容]卷积层相结合。在每个上采样步骤之前网络连接相应的低级特征图然后通过一个[此处可能是文档中缺失的卷积相关内容]卷积层来减少通道维度。经过这三个上采样层后输出通道分别为256、128和64。这种策略确保了丰富的特征表示涵盖了输入的高级和低级细节。 5.4 关键点特征金字塔在传统的特征金字塔网络FPN中多尺度检测很常见。然而对于关键点检测由于图像中的关键点大小变化不大KFPN采用了不同的方法。它提出了一种新颖的关键点特征金字塔用于在点空间中检测尺度不变的关键点。这涉及将每个尺度的特征图调整回最大尺度生成特征图[此处可能是文档中缺失的特征图相关内容]然后应用softmax操作来得出每个尺度的重要性权重。最终的尺度空间得分图[此处可能是文档中缺失的得分图相关内容]通过这些特征图的线性加权和获得。 5.5 检测头 KFPN的检测头包括三个基本组件和六个可选组件。这些组件旨在以最小的计算开销提高3D检测的准确性。受CenterNet的启发网络使用一个关键点作为连接所有特征的主中心。这个主中心的热图定义为[此处可能是文档中缺失的热图定义相关内容]其中[此处可能是文档中缺失的类别数量相关内容]表示对象类别数量。网络还输出由3D边界框的顶点和中心投影的九个透视点的热图表示为[此处可能是文档中缺失的热图表示相关内容]。 5.6 关键点关联和其他组件为了关联对象的关键点网络回归从主中心的局部偏移[此处可能是文档中缺失的偏移相关内容]。这有助于将属于同一对象的关键点分组。其他组件如3D对象的中心和顶点偏移、尺寸和方向被包括在内以提供更多约束并提高检测性能。方向由偏航角[此处可能是文档中缺失的偏航角相关内容]表示网络利用多箱方法回归局部方向。六、代码演示 - KFPN 6.1 训练策略 KFPN用于3D LiDAR目标检测的训练遵循一种侧重于平衡正负样本的策略。焦点损失被用于解决这种不平衡这是目标检测网络中优化学习过程的常见方法。整个管道在train.py脚本中实现。让我们探索构成这个训练管道的函数 def main_worker(gpu_idx, configs):configs.gpu_idx gpu_idxconfigs.device torch.device(cpu if configs.gpu_idx is None else cuda:{}.format(configs.gpu_idx))if configs.distributed:if configs.dist_url env:// and configs.rank -1:configs.rank int(os.environ[RANK])if configs.multiprocessing_distributed:# For multiprocessing distributed training, rank needs to be the# global rank among all the processesconfigs.rank configs.rank * configs.ngpus_per_node gpu_idxdist.init_process_group(backendconfigs.dist_backend, init_methodconfigs.dist_url,world_sizeconfigs.world_size, rankconfigs.rank)configs.subdivisions int(64 / configs.batch_size / configs.ngpus_per_node)else:configs.subdivisions int(64 / configs.batch_size)configs.is_master_node (not configs.distributed) or (configs.distributed and (configs.rank % configs.ngpus_per_node 0))if configs.is_master_node:logger Logger(configs.logs_dir, configs.saved_fn)logger.info( Created a new logger)logger.info( configs: {}.format(configs))tb_writer SummaryWriter(log_diros.path.join(configs.logs_dir, tensorboard))else:logger Nonetb_writer None# modelmodel create_model(configs)# load weight from a checkpointif configs.pretrained_path is not None:assert os.path.isfile(configs.pretrained_path), no checkpoint found at {}.format(configs.pretrained_path)model.load_state_dict(torch.load(configs.pretrained_path, map_locationcpu))if logger is not None:logger.info(loaded pretrained model at {}.format(configs.pretrained_path))# resume weights of model from a checkpointif configs.resume_path is not None:assert os.path.isfile(configs.resume_path), no checkpoint found at {}.format(configs.resume_path)model.load_state_dict(torch.load(configs.resume_path, map_locationcpu))if logger is not None:logger.info(resume training model from checkpoint {}.format(configs.resume_path))# Data Parallelmodel make_data_parallel(model, configs)# Make sure to create optimizer after moving the model to cudaoptimizer create_optimizer(configs, model)lr_scheduler create_lr_scheduler(optimizer, configs)configs.step_lr_in_epoch False if configs.lr_type in [multi_step, cosin, one_cycle] else True# resume optimizer, lr_scheduler from a checkpointif configs.resume_path is not None:utils_path configs.resume_path.replace(Model_, Utils_)assert os.path.isfile(utils_path), no checkpoint found at {}.format(utils_path)utils_state_dict torch.load(utils_path, map_locationcuda:{}.format(configs.gpu_idx))optimizer.load_state_dict(utils_state_dict[optimizer])lr_scheduler.load_state_dict(utils_state_dict[lr_scheduler])configs.start_epoch utils_state_dict[epoch] 1if configs.is_master_node:num_parameters get_num_parameters(model)logger.info(number of trained parameters of the model: {}.format(num_parameters))if logger is not None:logger.info( Loading dataset getting dataloader...)# Create dataloadertrain_dataloader, train_sampler create_train_dataloader(configs)if logger is not None:logger.info(number of batches in training set: {}.format(len(train_dataloader)))if configs.evaluate:val_dataloader create_val_dataloader(configs)val_loss validate(val_dataloader, model, configs)print(val_loss: {:.4e}.format(val_loss))returnfor epoch in range(configs.start_epoch, configs.num_epochs 1):if logger is not None:logger.info({}.format(* * 40))logger.info({} {}/{} {}.format( * 35, epoch, configs.num_epochs, * 35))logger.info({}.format(* * 40))logger.info( Epoch: [{}/{}].format(epoch, configs.num_epochs))if configs.distributed:train_sampler.set_epoch(epoch)# train for one epochtrain_one_epoch(train_dataloader, model, optimizer, lr_scheduler, epoch, configs, logger, tb_writer)if (not configs.no_val) and (epoch % configs.checkpoint_freq 0):val_dataloader create_val_dataloader(configs)print(number of batches in val_dataloader: {}.format(len(val_dataloader)))val_loss validate(val_dataloader, model, configs)print(val_loss: {:.4e}.format(val_loss))if tb_writer is not None:tb_writer.add_scalar(Val_loss, val_loss, epoch)# Save checkpointif configs.is_master_node and ((epoch % configs.checkpoint_freq) 0):model_state_dict, utils_state_dict get_saved_state(model, optimizer, lr_scheduler, epoch, configs)save_checkpoint(configs.checkpoints_dir, configs.saved_fn, model_state_dict, utils_state_dict, epoch)if not configs.step_lr_in_epoch:lr_scheduler.step()if tb_writer is not None:tb_writer.add_scalar(LR, lr_scheduler.get_lr()[0], epoch)if tb_writer is not### 训练策略续 python def train_one_epoch(train_dataloader, model, optimizer, lr_scheduler, epoch, configs, logger, tb_writer):batch_time AverageMeter(Time, :6.3f)data_time AverageMeter(Data, :6.3f)losses AverageMeter(Loss, :.4e)progress ProgressMeter(len(train_dataloader), [batch_time, data_time, losses],prefixTrain - Epoch: [{}/{}].format(epoch, configs.num_epochs))criterion Compute_Loss(deviceconfigs.device)num_iters_per_epoch len(train_dataloader)# switch to train modemodel.train()start_time time.time()for batch_idx, batch_data in enumerate(tqdm(train_dataloader)):data_time.update(time.time() - start_time)metadatas, imgs, targets batch_databatch_size imgs.size(0)global_step num_iters_per_epoch * (epoch - 1) batch_idx 1for k in targets.keys():targets[k] targets[k].to(configs.device, non_blockingTrue)imgs imgs.to(configs.device, non_blockingTrue).float()outputs model(imgs)total_loss, loss_stats criterion(outputs, targets)# For torch.nn.DataParallel caseif (not configs.distributed) and (configs.gpu_idx is None):total_loss torch.mean(total_loss)# compute gradient and perform backpropagationtotal_loss.backward()if global_step % configs.subdivisions 0:optimizer.step()# zero the parameter gradientsoptimizer.zero_grad()# Adjust learning rateif configs.step_lr_in_epoch:lr_scheduler.step()if tb_writer is not None:tb_writer.add_scalar(LR, lr_scheduler.get_lr()[0], global_step)if configs.distributed:reduced_loss reduce_tensor(total_loss.data, configs.world_size)else:reduced_loss total_loss.datalosses.update(to_python_float(reduced_loss), batch_size)# measure elapsed time# torch.cuda.synchronize()batch_time.update(time.time() - start_time)if tb_writer is not None:if (global_step % configs.tensorboard_freq) 0:loss_stats[avg_loss] losses.avgtb_writer.add_scalars(Train, loss_stats, global_step)# Log messageif logger is not None:if (global_step % configs.print_freq) 0:logger.info(progress.get_message(batch_idx))start_time time.time()6.2 验证验证在训练过程中同样重要。其主要目的是评估模型在验证数据集上的性能。为此在这个脚本中使用了validate()函数。让我们也详细看看这个函数 def validate(val_dataloader, model, configs):losses AverageMeter(Loss, :.4e)criterion Compute_Loss(deviceconfigs.device)# switch to train modemodel.eval()with torch.no_grad():for batch_idx, batch_data in enumerate(tqdm(val_dataloader)):metadatas, imgs, targets batch_databatch_size imgs.size(0)for k in targets.keys():targets[k] targets[k].to(configs.device, non_blockingTrue)imgs imgs.to(configs.device, non_blockingTrue).float()outputs model(imgs)total_loss, loss_stats criterion(outputs, targets)# For torch.nn.DataParallel caseif (not configs.distributed) and (configs.gpu_idx is None):total_loss torch.mean(total_loss)if configs.distributed:reduced_loss reduce_tensor(total_loss.data, configs.world_size)else:reduced_loss total_loss.datalosses.update(to_python_float(reduced_loss), batch_size)return losses.avg6.3 参数 val_dataloader提供验证数据批次的数据加载器。model正在评估的模型。configs包含评估参数包括设备信息的配置设置。 6.4 函数内部运作详细说明损失度量初始化一个名为losses的AverageMeter对象被初始化用于跟踪验证数据集上的平均损失。Compute_Loss函数使用配置中的指定设备进行初始化。这个函数将计算模型预测与真实值之间的损失。模型评估模式使用model.eval()将模型设置为评估模式。这会禁用某些仅在训练期间相关的层和行为如随机失活dropout和批量归一化batch normalization确保模型在验证期间的行为一致且确定性。在这里使用torch.no_grad()上下文管理器来禁用梯度计算这可以减少内存消耗并加快过程因为在模型评估时不需要梯度。返回平均损失在成功进行前向传播后函数返回整个验证数据集上的平均损失由losses AverageMeter计算得出。 6.5 模型推理在本节中我们将探索专门为处理和分析用于3D LiDAR目标检测任务的鸟瞰图BEV而设计的推理管道。 if __name__ __main__:configs parse_demo_configs()# Try to download the dataset for demonstrationserver_url https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_datadownload_url {}/{}/{}.zip.format(server_url, configs.foldername[:-5], configs.foldername)download_and_unzip(configs.dataset_dir, download_url)model create_model(configs)print(\n\n -* * 30 \n\n)assert os.path.isfile(configs.pretrained_path), No file at {}.format(configs.pretrained_path)model.load_state_dict(torch.load(configs.pretrained_path, map_locationcpu))print(Loaded weights from {}\n.format(configs.pretrained_path))configs.device torch.device(cpu if configs.no_cuda else cuda:{}.format(configs.gpu_idx))model model.to(deviceconfigs.device)model.eval()out_cap Nonedemo_dataset Demo_KittiDataset(configs)with torch.no_grad():for sample_idx in range(len(demo_dataset)):metadatas, front_bevmap, back_bevmap, img_rgb demo_dataset.load_bevmap_front_vs_back(sample_idx)front_detections, front_bevmap, fps do_detect(configs, model, front_bevmap, is_frontTrue)back_detections, back_bevmap, _ do_detect(configs, model, back_bevmap, is_frontFalse)# Draw prediction in the imagefront_bevmap (front_bevmap.permute(1, 2, 0).numpy() * 255).astype(np.uint8)front_bevmap cv2.resize(front_bevmap, (cnf.BEV_WIDTH, cnf.BEV_HEIGHT))front_bevmap draw_predictions(front_bevmap, front_detections, configs.num_classes)# Rotate the front_bevmapfront_bevmap cv2.rotate(front_bevmap, cv2.ROTATE_90_COUNTERCLOCKWISE)# Draw prediction in the imageback_bevmap (back_bevmap.permute(1, 2, 0).numpy() * 255).astype(np.uint8)back_bevmap cv2.resize(back_bevmap, (cnf.BEV_WIDTH, cnf.BEV_HEIGHT))back_bevmap draw_predictions(back_bevmap, back_detections, configs.num_classes)# Rotate the back_bevmapback_bevmap cv2.rotate(back_bevmap, cv2.ROTATE_90_CLOCKWISE)# merge front and back bevmapfull_bev np.concatenate((back_bevmap, front_bevmap), axis1)img_path metadatas[img_path][0]img_bgr cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR)calib Calibration(configs.calib_path)kitti_dets convert_det_to_real_values(front_detections)if len(kitti_dets) 0:kitti_dets[:, 1:] lidar_to_camera_box(kitti_dets[:, 1:], calib.V2C, calib.R0, calib.P2)img_bgr show_rgb_image_with_boxes(img_bgr, kitti_dets, calib)img_bgr cv2.resize(img_bgr, (cnf.BEV_WIDTH * 2, 375))out_img np.concatenate((img_bgr, full_bev), axis0)write_credit(out_img, (50, 410), text_authorCre: github.com/maudzung, org_fps(900, 410), fpsfps)if out_cap is None:out_cap_h, out_cap_w out_img.shape[:2]fourcc cv2.VideoWriter_fourcc(*MJPG)out_path os.path.join(configs.results_dir, {}_both_2_sides.avi.format(configs.foldername))print(Create video writer at {}.format(out_path))out_cap cv2.VideoWriter(out_path, fourcc, 30, (out_cap_w, out_cap_h))out_cap.write(out_img)if out_cap:out_cap.release()6.6 注意事项热图上的最大池化在推理过程中对中心热图应用一个3×3的最大池化操作以增强特征响应并抑制噪声。预测选择仅保留中心置信度大于0.2的前50个预测专注于最可能的对象中心。航向角计算使用反正切计算每个检测的偏航角即虚部与实部的比值提供检测对象的方向。 6.7 推理详细解析下载演示数据集脚本从指定的URL下载KITTI Vision 360数据集的一个较小样本然后将其解压到指定的数据集目录中。这一步对于获取推理所需的数据至关重要。模型和权重初始化使用create_model(configs)创建模型并从configs.pretrained_path加载预训练权重。这一步确保模型已经学习到了进行准确预测所需的特征并且模型被移动到配置中指定的设备CPU或GPU上。推理循环管道遍历Demo_KittiDataset该数据集可能包含数据集中每个样本的BEV图和其他相关数据。对于每个样本它加载前后BEV图以及其他元数据。分别对前后BEV图调用do_detect函数。这个函数执行实际的对象检测输出检测结果和修改后的BEV图。BEV图调整对BEV图前后进行处理转置、调整大小并使用draw_predictions在其上绘制预测结果。然后旋转这些图以获得正确的方向并将前后BEV图连接起来形成一个完整的BEV视角。转换和校准将常规RGB图像转换为BGR格式OpenCV常用格式并使用校准数据将检测结果转换为真实世界值然后将RGB图像和完整的BEV图连接起来形成最终的输出图像。在图像上添加版权信息和每秒帧数fps信息。将推理结果写入视频如果尚未初始化则创建一个VideoWriter对象将输出写入视频文件。每个处理后的图像帧都被写入视频文件创建一个检测过程的可视化。在过程结束时释放视频捕获最终确定视频文件。七、实验测试 7.1 实验结果分析基于从实验中获得的推理可视化可以得出以下观察结果 BEV图从传感器生成的自上而下的BEV 3D LiDAR深度图中检测到定位的对象。这将前后视图连接为一个完整的地图。三类3D目标检测在推理结果中检测到预定义的类别如汽车、行人和自行车。这些类别在KITTI 360 Vision数据集中预先进行了注释。定位准确性使用3D边界框可视化检测到的对象在2D RGB相机和3D LiDAR传感器两种模式中均如此。不仅如此还可以观察到两种流中边界框放置的准确性。实时性能推理管道在训练该模型的同一深度学习机器上进行了测试该机器配备了NVIDIA RTX 3080 Ti和12GB显存。在这种情况下模型在实时推理期间实现了一致的160 - 180 FPS性能。 7.2 评估指标分析在前几节中我们对训练模型的视觉结果有了一定的理解。但是该模型的性能仍有很大的提升空间。为此让我们看看评估指标这些指标是在训练过程中使用TensorBoard记录的。 7.3 学习率学习率LR图显示了一个逐步衰减的计划从略低于0.001开始在第300步时逐渐下降到约0.0001。在特定间隔的急剧下降表明了预定的大幅降低LR的时期。在这些下降之间LR趋于平稳使模型能够稳定其学习。该图表明了在初始快速学习和随后的微调之间的平衡遵循了模型训练中常见的LR调度实践。 7.4 训练损失在这个特定实验中KFPN模型总共训练了300个 epoch训练损失图显示了多个下降趋势初始高损失表明了早期学习阶段。随着训练的进行所有损失指标包括avg_loss、cen_offset_loss和total_loss持续下降表明模型在改进。值得注意的是损失曲线在约69k步时开始趋于平稳表明模型接近收敛。综合的total_loss也呈现下降趋势反映了各个损失优化的累积效果。 7.5 验证损失另一方面验证损失图在初始下降后呈现持续上升趋势这表明早期学习成功但随后出现过拟合。在50步之后持续的上升趋势表明模型的泛化能力下降。损失的波动表明学习的可变性最终验证损失稳定在2.8695左右高于其最小值证实了随着时间的推移在未见过的数据上性能下降。八、结论本实验对使用关键点特征金字塔网络KFPN模型进行3D LiDAR目标检测的研究得出了几个关键见解。该模型在BEV地图中展示了强大的目标定位能力整合了3D LiDAR深度图的前后视图以实现全面覆盖。目标检测的准确性值得注意系统能够有效地识别和围绕汽车、行人和自行车KITTI 360 Vision数据集中的三个关键类别放置边界框。在性能方面该模型在NVIDIA RTX 3080 Ti上进行的实时推理测试显示出令人印象深刻的结果始终达到160 - 180 FPS强调了该模型在实际应用中部署的潜力因为在这些应用中快速处理至关重要。在300个 epoch期间观察到的训练损失趋势强调了一个成功的学习阶段所有损失指标都表明稳步改进并接近收敛。这与验证损失形成对比验证损失在初始下降后显示增加表明在50步之后可能存在过拟合。训练和验证损失之间的差异表明虽然模型有效地学习了训练数据但其对新数据的泛化能力需要进一步增强。所进行的研究和获得的结果对先进驾驶辅助系统ADAS和自主导航系统的发展具有重要意义。该模型在准确快速目标检测方面的有效性为提高自动驾驶技术的安全性和效率开辟了道路。展望未来解决过拟合问题并确保模型的泛化仍然是一个优先事项有可能探索更复杂的正则化技术或自适应学习率计划以优化模型在未见过数据集上的性能。原文地址https://learnopencv.com/3d-lidar-object-detection/#aioseo-code-walkthrough-kfpn

查看全文

http://www.w-s-a.com/news/272355/