当前位置：首页 > news >正文

新的网站设计制作极速蜂app拉新加盟

news 2025/12/30 23:51:01

新的网站设计制作,极速蜂app拉新加盟,免费学软件的自学网站,网页制作平台软件pytorch#xff1a;YOLOV1的pytorch实现注#xff1a;本篇仅为学习记录、学习笔记#xff0c;请谨慎参考#xff0c;如果有错误请评论指出。参考#xff1a; 动手学习深度学习pytorch版——从零开始实现YOLOv1 目标检测模型YOLO-V1损失函数详解 3.1 YOLO系列理论合集(Y…pytorchYOLOV1的pytorch实现注本篇仅为学习记录、学习笔记请谨慎参考如果有错误请评论指出。参考动手学习深度学习pytorch版——从零开始实现YOLOv1 目标检测模型YOLO-V1损失函数详解 3.1 YOLO系列理论合集(YOLOv1~v3) 代码仓库https://gitee.com/wtryb/yolov1-pytorch-implement 模型权重链接https://pan.baidu.com/s/1ZSl-VwkjaRUPuD9CkA6sdg?pwdblhj 提取码blhj YoloV1的预测过程上图是作者在原论文Introduction部分对YoloV1检测器系统的大致介绍。对比R-CNN系列YoloV1的结构相对来说简单很多。Yolo的主要思想就是将识别问题看作是一个回归问题。因为全连接层的存在YoloV1只能接受(448x448)尺寸(分辨率)的图片因此需要将输入的图片进行resize然后输入到网络中通过网络进行预测后的结果进行非极大值抑制得到最终结果。上图是作者在原论文中Introduction部分对网络预测过程解释的原图。虽然这张图有两个分支但是是从一个网络中得到两个分支上的结果。网格会将输入图片分成(SxS)个小方格(grid cell)然后在每个小方格上预测边界框和类别概率最后得到最后的预测结果。YoloV1这种将输入分为小网格的操作和锚框有些相似。 1、网络将输入分成SxS个小网格S是超参数可以设置不同的值原论文设置为7也就是将输入图像分成了7x7个(64x64)的小网格。 2、如果某个对象(Objectness)的中心坐标落在了哪一个网格内那个网格就负责预测这个物体。网格会预测B个边界框和C个类别概率边界框数和类别数是超参数可以设置网格预测多少个边界框原论文是2个以及有多少个类别就有多少个类别概率。而每个边界框会有5个参数x,y,w,h,c因此网络最终输出就是(batch, (B*5C), S,S)。下面说明边界框预测参数的含义。 x , y x,y x,y边界框的中心相对于网格左上角的坐标偏移。 w , h w,h w,h边界框相对于整个图像的大小。 c c c边界框的置信度。这五个参数的取值范围都是[0,1]。其他四个参数都好理解主要是C边界框置信度(confidence score)这个参数怎么理解。下面两个问题我认为是关键。如何理解边界框置信度这个参数原论文中说明边界框置信度(confidence score)就是网络认为网格中存在物体的置信度以及网络对于预测的边界框的准确率。也就是说这个值越高越好越高越认为这个网格预测到了对象(objectness)而且很准确。如何计算置信度论文中将C定义为了 P r ( O b j e c t ) ∗ I O U p r e d t r u t h Pr\left( Object \right) *IOU_{pred}^{truth} Pr(Object)∗IOUpredtruth。解释下这两个值的意思 P r ( O b j e c t ) { 1 有对象存在 0 无对象存在 Pr\left( Object \right) \begin{cases} 1 \text{有对象存在}\\ 0 \text{无对象存在}\\ \end{cases} Pr(Object){10有对象存在无对象存在 I O U p r e d t r u c h : G T 真实边界框与预测边界框的 I O U 值。 IOU_{pred}^{truch}:GT真实边界框与预测边界框的IOU值。 IOUpredtruch:GT真实边界框与预测边界框的IOU值。那么两个值乘起来也就意味着如果这个网格有对象存在置信度就等于GT真实边界框与预测边界框的IOU值如果没有对象存在就等于0。YoloV1对于采样区域策略以及正负样本区分做的很粗糙因此训练时C的取值无非就是0和1GT边界框中心落在哪个网格哪个网格的置信度就取1此外取0。推理预测时哪个网格的置信度越接近于1对象中心在那个网格的概率以及边界框预测准确率越高。总而言之置信度的取值衡量了网格对于对象预测的质量值越高越质量越好。 YoloV1的网络设计作者收到GoogleNet的启发设计了Darknet其结构如上图所示。随着Yolo系列的迭代主干网络也在迭代。 YoloV1的损失函数设计作者在论文中提到使用了平方误差和损失(sum-squared error)因为它易于优化但是对于最终最大平均精度(maximizing average precision)的目标来说不是很合适因为它没有区分开定位损失和类别损失因此作者做了一些修改。下面来进行说明。 1、对正负样本的损失设置权重。在训练时负样本的数量大大压过正样本正负样本也就是存在和不存在对象的小网格这会使得网络难以训练以及造成网络训练时的不稳定。 2、使用宽高的平方根计算损失。平方误差和损失将大边界框和小边界框的误差认为是同等程度的误差而实际情况是相同的偏移误差对于小边界框影响更大。如下图黑框是GT边界框红框是预测边界框小红框和大红框相对各自的GT边界框的坐标偏移是相同的从视觉上来看相同的偏移对于小框影响更大。 3、采用于GT边界框最大IOU的边界框作为预测器。Yolo每个网格生成多个框但是只采用于GT边界框IOU最大的边界框作为预测器这种操作使得边界框有了分化使得边界框在预测特定大小、宽高比、类别时更加准确。论文给出的损失函数如图。解释几个参数 λ c o o r d 取 5 正样本的权重 \lambda _{coord}取5正样本的权重 λcoord取5正样本的权重 λ n o o b j 取 0.5 负样本的权重 \lambda _{noobj}取0.5负样本的权重 λnoobj取0.5负样本的权重 1 i j o b j 第 i 个网格的第 j 个边界框作为预测器时取 1 其余取 0 1_{ij}^{obj}第i个网格的第j个边界框作为预测器时取1其余取0 1ijobj第i个网格的第j个边界框作为预测器时取1其余取0 1 i j n o o b j 第 i 个网格的第 j 个边界框不作为预测器时取1其余取 0 1_{ij}^{noobj}\text{第}i\text{个网格的第}j\text{个边界框不作为预测器时取1其余取}0 1ijnoobj第i个网格的第j个边界框不作为预测器时取1其余取0 S 网格的数量 S\text{网格的数量} S网格的数量 B 每个网格预测边界框的数量 B\text{每个网格预测边界框的数量} B每个网格预测边界框的数量总体理解下YoloV1的损失函数正样本参与位置损失、置信度损失和类别损失的计算负样本只计算置信度损失同时为了减弱负样本数量过多的问题给正负样本的损失计算加上了权重。 YoloV1的优缺点优点 1、非常快 2、结构简单缺点 1、定位误差大 2、区域采样机制设计粗糙 YoloV1的Pytorch实现 1、构建数据集。使用Pascal VOC2007数据集这里不再多介绍。YoloV1的输入尺寸固定是(448x448)因此读入图片后需要进行resize直接resize即可不需要做其他操作。 VOC2007对于每张图片都有标注文件读取标注文件中的边界框和类别按照YoloV1的输出进行编码。 def yolo_encoder(boxes, labels, yolo_config):# print(进入编码器)target torch.zeros(size (30, yolo_config[num_grid], yolo_config[num_grid]), dtype torch.float)# print(标签的形状 , target.shape)cell_size yolo_config[input_size] / yolo_config[num_grid]# print(网格大小:, cell_size)# print(f一共处理{len(boxes)}个边界框 Boxes:{boxes})for index, box in enumerate(boxes):# print(f正在处理第{index1}个边界框:, box)x_c, y_c, w, h point_to_center(box)# print(归一化前 x_c, y_c, x_c, y_c)# print(归一化前 w, h, w, h)x_i math.ceil(x_c // cell_size)y_i math.ceil(y_c // cell_size)delta_x float((x_c - x_i * cell_size) / cell_size)delta_y float((y_c - y_i * cell_size) / cell_size)w float(w / yolo_config[input_size])h float(h / yolo_config[input_size])# print(物体中心所在网格:, (x_i, y_i))# print(得到边界框偏移, (delta_x, delta_y))# print(归一化后边界框宽高, w, h)# print(x_i, y_i)# 前两个值是中心坐标对网格左上角坐标的偏移归一化到0-1target[0, x_i, y_i] delta_xtarget[1, x_i, y_i] delta_y# print(delta_x, delta_y, delta_x, delta_y)target[2, x_i, y_i] wtarget[3, x_i, y_i] h# print(w, h, w, h)# 每个网格预测两个边界框每个边界框的最后一个参数是confidence score因为数据集里是真实框因此为1# 预测到了物体而且就是就是真实框置信度就是1target[4, x_i, y_i] 1target[5, x_i, y_i] delta_xtarget[6, x_i, y_i] delta_ytarget[7, x_i, y_i] wtarget[8, x_i, y_i] htarget[9, x_i, y_i] 1# 把边界框对应的类在编码中的位置置为1代表概率是1target[labels[index]10, x_i, y_i] 1# print(labels[index])# print(编码结果:, target[:, x_i, y_i])return targetclass YoloV1Dataset(Dataset):def __init__(self, path):self.path path# 从数据集中获取样本# 这个过程耗时很短self.obj_dict_list pascal_VOC.xml_parse_dict(path)def __getitem__(self, index):# 按照索引获取对应的图片名称self.image_name self.obj_dict_list[index][image_name]# print(self.image_name)# 读取图像img cv2.imread(os.path.join(self.path, JPEGImages,self.image_name))# 转换色彩通道img cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# 按照索引获取边界和对应标号boxes_and_label_list self.obj_dict_list[index][boxes]# print(boxes_and_label_list, boxes_and_label_list)# print(boxes_and_label_list)# 放缩图片同时放缩边界框img, self.boxes scale_img_with_box(img, [i[0:4] for i in boxes_and_label_list])# print(scale_img_with_box new boxes, self.boxes)self.labels [i[4] for i in boxes_and_label_list]# self.boxes [point_to_center(i) for i in self.boxes]# print(point_to_center self.boxes, self.boxes)# print(f一共有{len(self.boxes)}个边界框)# 通过编码器编码网络的标签target yolo_encoder(self.boxes, self.labels, yolo_config)# print(target.shape)# 转换为张量img transforms.ToTensor()(img)return img, targetdef __len__(self):# 图片的个数return len(self.obj_dict_list) 2、构建YoloV1网络使用Resnet34代替Darknet主干网络。 from torch import nn from torchvision.models import resnet34, resnet18 import torchfrom torchsummary import summaryfrom yoloconfig import yolo_configclass yoloV1Resnet(nn.Module):def __init__(self):super(yoloV1Resnet, self).__init__()# 使用预训练#resnet resnet18(pretrained True)resnet resnet34(pretrainedTrue)# print(resnet)# 记录卷积输出的通道数resnet_out_channels resnet.fc.in_features# 构造网络去掉resnet34的全连接层self.feature_extractor nn.Sequential(*list(resnet.children())[:-2])# 以下是YOLOv1的最后四个卷积层self.Conv_layers nn.Sequential(nn.Conv2d(resnet_out_channels, 1024, 3, padding1),nn.BatchNorm2d(1024), # 为了加快训练这里增加了BN层原论文里YOLOv1是没有的nn.LeakyReLU(),nn.Conv2d(1024, 1024, 3, stride2, padding1),nn.BatchNorm2d(1024),nn.LeakyReLU(),nn.Conv2d(1024, 1024, 3, padding1),nn.BatchNorm2d(1024),nn.LeakyReLU(),nn.Conv2d(1024, 1024, 3, padding1),nn.BatchNorm2d(1024),nn.LeakyReLU(),)# 以下是YOLOv1的最后2个全连接层self.Conn_layers nn.Sequential(nn.Linear(7 * 7 * 1024, 4096),nn.LeakyReLU(),nn.Linear(4096, 7 * 7 * 30),nn.Sigmoid() # 增加sigmoid函数是为了将输出全部映射到(0,1)之间因为如果出现负数或太大的数后续计算loss会很麻烦)def forward(self, input):input self.feature_extractor(input)input self.Conv_layers(input)input input.view(input.size()[0], -1)input self.Conn_layers(input)return input.reshape(-1, (5 * yolo_config[num_boxes] yolo_config[num_class]), 7, 7) # 记住最后要reshape一下输出数据if __name__ __main__:if __name__ __main__:x torch.randn((1, 3, 448, 448))net yoloV1Resnet()print(net)y net(x)print(y.size())3、训练网络 from torch.utils.data import DataLoader import torch from MyLib.nnTools.Trainer import Trainerfrom network import yolo from dataprocess import dataset from network import yololossdef train_model():# PATH rE:\Postgraduate_Learning\Python_Learning\DataSets\pascal voc2012\VOCtrainval_11-May-2012\VOCdevkit\VOC2012PATH rE:\Postgraduate_Learning\Python_Learning\DataSets\pascal_voc2007\VOCdevkit\VOC2007# 定义yolo网络yolo_net yolo.yoloV1Resnet()yolo_net.load_state_dict(torch.load(models/_keyboardInterrupt_.pth))# 冻结卷积层的参数for layer in yolo_net.children():layer.requires_grad Falsebreak# 定义数据集yolo_train_dataset dataset.YoloV1Dataset(PATH)# 定义数据加载器0yolo_train_iter DataLoader(dataset yolo_train_dataset, shuffle True, batch_size 4)optimer torch.optim.SGD(yolo_net.parameters(), lr1e-3, weight_decay 0.0005)StepLR torch.optim.lr_scheduler.StepLR(optimer, step_size7, gamma0.65)loss yololoss.yoloV1Loss()trainer Trainer()trainer.config_trainer(net yolo_net, dataloader yolo_train_iter,optimer optimer, loss loss, lr_scheduler StepLR)trainer.config_task(num_epoch 60)trainer.start_task(True, ./models)if __name__ __main__:train_model()4、推理预测 YoloV1的网络输出还需要进行一步解码才能获取边界框和类别。 def yolo_decoder(pred, class_name_list, yolo_config, confidence_thr 0.0002, class_thr 0.5):boxes []cell_size yolo_config[input_size] / yolo_config[num_grid]# 循环遍历每个批次for batch in range(pred.shape[0]):# 循环遍历x轴for x in range(yolo_config[num_grid]):# 循环遍历y轴for y in range(yolo_config[num_grid]):# 得到类别class_name class_name_list[torch.argmax(pred[batch, 10:, x, y])]print(class predict, torch.max(pred[batch, 10:, x, y]).item())confidence_box1 pred[batch, 4, x, y]# * torch.max(pred[batch, 10:, x, y])confidence_box2 pred[batch, 9, x, y]# * torch.max(pred[batch, 10:, x, y])print(fconfidence_box1: {confidence_box1.item()}, fconfidence_box2: {confidence_box2.item()})# 如果没有物体跳过if confidence_box1 confidence_thr or confidence_box2 confidence_thr:continueif torch.max(pred[batch, 10:, x, y]).item() class_thr:# print(不符合阈值的box1, pred[batch, 0:5, x, y], 不符合阈值的box1, pred[batch, 6:11, x, y])continue# print(f有物体存在的网格,x,y)# 判断confidence scores哪个大哪个就是预测器if confidence_box1 confidence_box2:box pred[batch, 0:5, x, y]# print(box)# print(f解码前结果 box: , box)box[0] (box[0] * cell_size x * cell_size).item()box[1] (box[1] * cell_size y * cell_size).item()box[2] (box[2] * yolo_config[input_size]).item()box[3] (box[3] * yolo_config[input_size]).item()# 转换坐标box_xy center_to_point(box[0:4])# print(f解码结果 box: , box)# print(f解码结果 class_name: , class_name)boxes.append((*(box_xy), confidence_box1.item(), class_name))if confidence_box1 confidence_box2:box pred[batch, 6:11, x, y]# print(box)# print(f解码前结果 box: , box)box[0] (box[0] * cell_size x * cell_size).item()box[1] (box[1] * cell_size y * cell_size).item()box[2] (box[2] * yolo_config[input_size]).item()box[3] (box[3] * yolo_config[input_size]).item()# 转换坐标box_xy center_to_point(box[0:4])# print(f解码结果 box: , box)# print(f解码结果 class_name: , class_name)boxes.append((*(box_xy), confidence_box2.item(), class_name))# print(box)return boxesimport cv2 import torchimport yoloconfig from network import yolo from network.encoder import calculate_iou, yolo_decoder from torch.utils.data import DataLoader from torchvision.transforms import transforms import numpy as npfrom MyLib.imgProcess.draw import cv2_draw_one_boxCOLOR [(255,0,0),(255,125,0),(255,255,0),(255,0,125),(255,0,250),(255,125,125),(255,125,250),(125,125,0),(0,255,125),(255,0,0),(0,0,255),(125,0,255),(0,125,255),(0,255,255),(125,125,255),(0,255,0),(125,255,125),(255,255,255),(100,100,100),(0,0,0),] # 用来标识20个类别的bbox颜色可自行设定 CLASS [aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog,horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor]def calculate_iou_1(box1, box2):# 计算两个边界框的交集面积x_left max(box1[0], box2[0])y_top max(box1[1], box2[1])x_right min(box1[2], box2[2])y_bottom min(box1[3], box2[3])if x_right x_left or y_bottom y_top:return 0.0intersection_area (x_right - x_left) * (y_bottom - y_top)box1_area (box1[2] - box1[0]) * (box1[3] - box1[1])box2_area (box2[2] - box2[0]) * (box2[3] - box2[1])iou intersection_area / float(box1_area box2_area - intersection_area)return ioudef nms(boxes, threshold):非极大值抑制算法NMS:param boxes: 包含每个边界框的左上角和右下角坐标、置信度和类别的列表:param threshold: 重叠面积阈值:return: 保留的边界框列表if len(boxes) 0:return []# 分别提取边界框的坐标、置信度和类别信息x1 np.array([box[0] for box in boxes])y1 np.array([box[1] for box in boxes])x2 np.array([box[2] for box in boxes])y2 np.array([box[3] for box in boxes])scores np.array([box[4] for box in boxes])areas (x2 - x1 1) * (y2 - y1 1)# 根据边界框置信度降序排列order scores.argsort()[::-1]keep []while len(order) 0:i order[0] # 取出当前置信度最高的边界框keep.append(i)xx1 np.maximum(x1[i], x1[order[1:]])yy1 np.maximum(y1[i], y1[order[1:]])xx2 np.minimum(x2[i], x2[order[1:]])yy2 np.minimum(y2[i], y2[order[1:]])w np.maximum(0.0, xx2 - xx1 1)h np.maximum(0.0, yy2 - yy1 1)intersection w * hiou intersection / (areas[i] areas[order[1:]] - intersection)inds np.where(iou threshold)[0]order order[inds 1]return [boxes[i] for i in keep]if __name__ __main__:model yolo.yoloV1Resnet()# 2023.11.11 定位不准可能是单元格内边界框的置信度误差比较大导致定位时定位在了错误的网格# 训练时loss会震荡# 11.12 模型训练loss仍然下不来可能是数据集太少的原因# 11.13 改小batch继续训练之前尝试更换主体网络为resnet18不行减小学习率不行model.load_state_dict(torch.load(models/_keyboardInterrupt_.pth)) # 加载训练好的模型model.eval()model.cuda()img cv2.imread(./img/000229.jpg)img cv2.resize(img, (448, 448))inputs cv2.cvtColor(img, cv2.COLOR_BGR2RGB)inputs transforms.ToTensor()(inputs)inputs inputs.to(torch.device(cuda:0))inputs torch.unsqueeze(inputs, dim0)pred model(inputs) # pred的尺寸是(1,30,7,7)pred pred.detach().cpu()# pred pred.squeeze(dim0) # 压缩为(30,7,7)# pred pred.permute((1, 2, 0)) # 转换为(7,7,30)print(pred[0, 4, :, :])print(pred[0, 9, :, :])boxes yolo_decoder(pred, CLASS, yolo_configyoloconfig.yolo_config, confidence_thr0.1)print(boxes, boxes)box_boxes []for i in boxes:if i[3] - i[1] 10:continueelse:box_boxes.append(i)# print(nms前, box_boxes)new_boxes nms(box_boxes, 0.3)# print(nms后, new_boxes)for i in new_boxes:# print(i)cv2_draw_one_box(img, i, (255, 0, 255))cv2.imshow(aa, img)cv2.waitKey(0)

查看全文

http://www.w-s-a.com/news/510510/