当前位置：首页 > news >正文

高端网站特色深圳建网站哪

news 2026/8/1 9:27:33

高端网站特色,深圳建网站哪,线上招生引流推广方法,大型网站如何优化yolov8源码解读Detect层 Detect层解读网络各层解读及detect层后的处理关于网络的backbone,head#xff0c;以及detect层后处理#xff0c;可以参考文章结尾博主的文章。 Detect层解读先贴一下全部代码,下面一一解读。 class Detect(nn.Module):YOLOv8 … yolov8源码解读Detect层 Detect层解读网络各层解读及detect层后的处理关于网络的backbone,head以及detect层后处理可以参考文章结尾博主的文章。 Detect层解读先贴一下全部代码,下面一一解读。 class Detect(nn.Module):YOLOv8 Detect head for detection models.dynamic False # force grid reconstructionexport False # export modeshape Noneanchors torch.empty(0) # initstrides torch.empty(0) # initdef __init__(self, nc80, ch()):Initializes the YOLOv8 detection layer with specified number of classes and channels.super().__init__()self.nc nc # number of classesself.nl len(ch) # number of detection layersself.reg_max 16 # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)self.no nc self.reg_max * 4 # number of outputs per anchorself.stride torch.zeros(self.nl) # strides computed during buildc2, c3 max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100)) # channelsself.cv2 nn.ModuleList(nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch)self.cv3 nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch)self.dfl DFL(self.reg_max) if self.reg_max 1 else nn.Identity()def forward(self, x):Concatenates and returns predicted bounding boxes and class probabilities.shape x[0].shape # BCHW# print(, x[0].shape)# print(, x[1].shape)# print(, x[2].shape)for i in range(self.nl):x[i] torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)if self.training:return xelif self.dynamic or self.shape ! shape:self.anchors, self.strides (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))self.shape shapex_cat torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)if self.export and self.format in (saved_model, pb, tflite, edgetpu, tfjs): # avoid TF FlexSplitV opsbox x_cat[:, :self.reg_max * 4]cls x_cat[:, self.reg_max * 4:]else:box, cls x_cat.split((self.reg_max * 4, self.nc), 1)dbox dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywhTrue, dim1) * self.stridesif self.export and self.format in (tflite, edgetpu):# Normalize xywh with image size to mitigate quantization error of TFLite integer models as done in YOLOv5:# https://github.com/ultralytics/yolov5/blob/0c8de3fca4a702f8ff5c435e67f378d1fce70243/models/tf.py#L307-L309# See this PR for details: https://github.com/ultralytics/ultralytics/pull/1695img_h shape[2] * self.stride[0]img_w shape[3] * self.stride[0]img_size torch.tensor([img_w, img_h, img_w, img_h], devicedbox.device).reshape(1, 4, 1)dbox / img_size# print(cls.shape)y torch.cat((dbox, cls.sigmoid()), 1)# print(y.shape)return y if self.export else (y, x)dynamic False #这个属性指示网格通常是特征图上的锚框网格是否需要动态地重建export False #这个属性用于指示模型是否处于导出模式。shape None # 用于存储输入图像或特征图的尺寸。anchors torch.empty(0) # 创建了一个空的PyTorch张量strides torch.empty(0)步长strides是卷积神经网络中特征图相对于输入图像的缩小比例。例如如果步长是32那么一个32x32像素的区域在特征图上就对应一个单元。和anchors一样这里的torch.empty(0)表示步长尚未初始化。 def __init__(self, nc80, ch()):Initializes the YOLOv8 detection layer with specified number of classes and channels.super().__init__()self.nc nc # number of classesself.nl len(ch) # number of detection layersself.reg_max 16 # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)self.no nc self.reg_max * 4 # number of outputs per anchorself.stride torch.zeros(self.nl) # strides computed during buildc2, c3 max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100)) # channelsself.cv2 nn.ModuleList(nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch)self.cv3 nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch)self.dfl DFL(self.reg_max) if self.reg_max 1 else nn.Identity()nc:类别数 nl:检测层的数量目标检测中为3。 ch:传入的图片通道尺寸在yolov8n,图片大小为640*640时。这里的ch为256,128,64 no:两个卷积再拼接后输出通道数为4×reg_maxnc c2,c3:计算卷积层的通道数。 cv2,cv3:定义的卷积操作以输出有关类别和选框的特征图。 dfl:通过将分布式的概率分布转化为单一的预测值 class DFL(nn.Module):def __init__(self, c116):Initialize a convolutional layer with a given number of input channels.super().__init__()self.conv nn.Conv2d(c1, 1, 1, biasFalse).requires_grad_(False)x torch.arange(c1, dtypetorch.float)self.conv.weight.data[:] nn.Parameter(x.view(1, c1, 1, 1))self.c1 c1def forward(self, x):Applies a transformer layer on input tensor x and returns a tensor.b, c, a x.shape # batch, channels, anchorsreturn self.conv(x.view(b, 4, self.c1, a).transpose(2, 1).softmax(1)).view(b, 4, a)self.conv创建了一个输入通道为16输出为1没有偏置项不需要进行梯度更新的卷积层。这样的权重设置实际上模拟了一个积分过程将卷积操作变成了加权求和的形式。 x:1到15的整数。 self.conv.weight.data[:] nn.Parameter(x.view(1, c1, 1, 1)) 这里使用nn.Parameter将重塑后的张量设置为模型的参数并且参数不会被更新。假设前向传播中x的形状为(1, 64, 8400)下面解释下forword中的变化。 1x.view(b, 4, self.c1, a): 这个操作是对x的形状进行重塑。self.c1是16因为输入通道数是64即4*self.c1那么a是8400代表了所有锚点的数量。b是批次大小这里为1。所以x.view(b, 4, self.c1, a)将x从(1, 64, 8400)重塑为(1, 4, 16, 8400)。在这个形状中我们得到了每个锚点的每个坐标轴x, y, 宽度, 高度上的16个预测值可能代表某种概率分布。 2transpose(2, 1): 这个操作交换第二维和第三维。在应用transpose之后张量的形状变为(1, 16, 4, 8400)。这样做的目的是让每组概率分布的16个预测值连续地排列在一起为后面的softmax运算做准备。 3softmax(1): softmax函数应用于第一维现在是16个预测值的这一维。softmax确保了这16个值之和为1转换为一个有效的概率分布表示每个预测值的可能性。 4self.conv(…): 这个操作将配置好的卷积层应用在进行了softmax操作的张量上。由于卷积层的权重已被设置为从0到15的整数并且不更新权重不进行梯度下降优化这个步骤实际上是在计算期望值。卷积层将每个离散的概率值乘以其相应的索引也就是权重然后对结果进行求和得到该坐标的预测值。 5view(b, 4, a): 最后一步是将张量的形状从卷积操作后的(1, 1, 4, 8400)转换回(1, 4, 8400)。这样确保了最终的输出张量与每个坐标轴的预测值x, y, 宽度, 高度和所有锚点的数量对齐。总的来说dfl层就是对预测的坐标求加权期望值。将1,648400先变为1,16,48400然后对这16个通道求加权期望变为1,4,8400即这8400个锚点中的每一个锚点x,y,width,hight的加权平均值。接下来是前向传播的过程。打印传入的x形状发现通道数是64,128,256。原因Detect层接受15,1821层的输入。原本通道数是1024,512,256。但是yolov8n还需要乘0.25。经过cv2通道数变为64经过cv3通道数变为nc我这里nc为2二分类。在经过cat拼接在通道维度上拼接所以x[i]的通道数变为66。如果处于训练模式就直接返回x。否则执行下面的代码将特征图列表x1×66×40×40,1×66×80×80,1×66×20×20传递给make_anchors()函数。make_anchors函数用于生成锚点anchors,它通常用在目标检测网络中。每个锚点代表了特征图上的一个点,可以用来预测相对于该点的边界框。strides是这些特征图相对于原始图像的下采样步长。简单来说生成了8400个锚点40×4080×8020×20变量为anchors形状为1×2×8400。同时生成了8400个步长变量为strides,形状为1×8400。参数0.5表示每个锚点处于每个像素块的中央。 if self.training:return xelif self.dynamic or self.shape ! shape:self.anchors, self.strides (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))self.shape shape将xi按照2维度进行拼接xi分别为1×66×40×40,1×66×80×80,1×66×20×20。拼接后的x_cat为1×66×8400 x_cat torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)这段代码就是把x_cat进行拆分。box形状为1×64×8400包含每个边界框的回归参数。cls形状为1×2×8400会包含类别预测2是因为我这里类别为2。 if self.export and self.format in (saved_model, pb, tflite, edgetpu, tfjs): # avoid TF FlexSplitV opsbox x_cat[:, :self.reg_max * 4]cls x_cat[:, self.reg_max * 4:]else:box, cls x_cat.split((self.reg_max * 4, self.nc), 1)dfl层就是对预测的坐标求加权期望值。将1,648400先变为1,16,48400然后对这16个通道求加权期望变为1,4,8400即这8400个锚点中的每一个锚点x1,y1,x2,y2的加权平均值。dist2bbox()函数的作用是将锚点x1,y1,x2,y2转换为x,y,width,hight的形式。最后在乘以步长还原到原图的大小比例。 dbox dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywhTrue, dim1) * self.strides此代码片段的作用是在模型导出为 Tensorflow Lite (tflite) 或 Edge TPU 兼容格式时对预测框 (dbox) 进行归一化处理。 if self.export and self.format in (tflite, edgetpu):# Normalize xywh with image size to mitigate quantization error of TFLite integer models as done in YOLOv5:# https://github.com/ultralytics/yolov5/blob/0c8de3fca4a702f8ff5c435e67f378d1fce70243/models/tf.py#L307-L309# See this PR for details: https://github.com/ultralytics/ultralytics/pull/1695img_h shape[2] * self.stride[0]img_w shape[3] * self.stride[0]img_size torch.tensor([img_w, img_h, img_w, img_h], devicedbox.device).reshape(1, 4, 1)dbox / img_size此时y的形状为1×66×8400。代表有8400个锚点每个锚点包含坐标框的x,y,width,hight以及类别得分信息。 y torch.cat((dbox, cls.sigmoid()), 1)返回值至此detect层结束。完整的预测后续还需要进行一些处理。如进行非极大抑制对这8400个锚点进行筛选 return y if self.export else (y, x)另外最后的bias_init函数用于初始化一个目标检测模型中的Detect层的偏置。确保在训练开始时偏置值是基于合理假设的。这种方法的目标是为模型提供一个好的起点并有助于加速训练过程中的收敛。 def bias_init(self):Initialize Detect() biases, WARNING: requires stride availability.m self # self.model[-1] # Detect() module# cf torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlengthnc) 1# ncf math.log(0.6 / (m.nc - 0.999999)) if cf is None else torch.log(cf / cf.sum()) # nominal class frequencyfor a, b, s in zip(m.cv2, m.cv3, m.stride): # froma[-1].bias.data[:] 1.0 # boxb[-1].bias.data[:m.nc] math.log(5 / m.nc / (640 / s) ** 2) # cls (.01 objects, 80 classes, 640 img)网络各层解读及detect层后的处理关于backbone,head层,以及detect层参考下面博主的文章讲的非常好。链接: Yolov 8源码超详细逐行解读网络结构细讲(自我用的小白笔记) 链接: 最细致讲解yolov8模型推理完整代码–前处理后处理

查看全文

http://www.w-s-a.com/news/250425/