当前位置: 首页 > news >正文

做游戏模板下载网站有哪些企业网站做电脑营销

做游戏模板下载网站有哪些,企业网站做电脑营销,修改wordpress 表格,泉州做网站多少钱模型微调#xff08;fine-tune)-迁移学习 torchvision微调timm微调半精度训练 起源#xff1a; 1、随着深度学习的发展#xff0c;模型的参数越来越大#xff0c;许多开源模型都是在较大数据集上进行训练的#xff0c;比如Imagenet-1k#xff0c;Imagenet-11k等2、如果…模型微调fine-tune)-迁移学习 torchvision微调timm微调半精度训练 起源 1、随着深度学习的发展模型的参数越来越大许多开源模型都是在较大数据集上进行训练的比如Imagenet-1kImagenet-11k等2、如果数据集可能只有几千张训练几千万参数的大模型过拟合无法避免3、如果我们想从零开始训练一个大模型那么我们的解决办法是收集更多的数据。然而收集和标注数据会花费大量的时间和资⾦成本无法承受 解决方案 应用迁移学习(transfer learning)将从源数据集学到的知识迁移到目标数据集上比如ImageNet数据集的图像大多跟椅子无关但在该数据集上训练的模型可以抽取较通用的图像特征从而能够帮助识别边缘、纹理、形状和物体组成模型微调finetune:就是先找到一个同类的别人训练好的模型基于已经训练好的模型换成自己的数据通过训练调整一下参数 不同数据集下使用微调 数据集1 - 数据量少但数据相似度非常高 - 在这种情况下我们所做的只是修改最后几层或最终的softmax图层的输出类别。 数据集2 - 数据量少数据相似度低 - 在这种情况下我们可以冻结预训练模型的初始层比如k层并再次训练剩余的n-k层。由于新数据集的相似度较低因此根据新数据集对较高层进行重新训练具有重要意义。 数据集3 - 数据量大数据相似度低 - 在这种情况下由于我们有一个大的数据集我们的神经网络训练将会很有效。但是由于我们的数据与用于训练我们的预训练模型的数据相比有很大不同。使用预训练模型进行的预测不会有效。因此最好根据你的数据从头开始训练神经网络Training from scatch 数据集4 - 数据量大数据相似度高 - 这是理想情况。在这种情况下预训练模型应该是最有效的。使用模型的最好方法是保留模型的体系结构和模型的初始权重。然后我们可以使用在预先训练的模型中的权重来重新训练该模型。 微调的是什么 换数据源针对K层进行重新训练K层的权重shape调整 1、模型微调(fine-tune)一般流程 1、在源数据集(如ImageNet数据集)上预训练一个神经网络模型即源模型2、创建一个新的神经网络模型即目标模型它复制了源模型上除了输出层外的所有模型设计及其参数3、为目标模型添加一个输出⼤小为⽬标数据集类别个数的输出层并随机初始化该层的模型参数4、在目标数据集上训练目标模型。我们将从头训练输出层而其余层的参数都是基于源模型的参数微调得到的 [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xfegFfaM-1692613842808)(attachment:image.png)] 2、torchvision微调 2.1 实例化Model import torchvision.models as models resnet34 models.resnet34(pretrainedTrue)pretrained参数说明 1、通过True或者False来决定是否使用预训练好的权重在默认状态下pretrained False意味着我们不使用预训练得到的权重2、当pretrained True意味着我们将使用在一些数据集上预训练得到的权重 注意如果中途强行停止下载的话一定要去对应路径下将权重文件删除干净否则会报错。 2.2 训练特定层 如果我们正在提取特征并且只想为新初始化的层计算梯度其他参数不进行改变。那我们就需要通过设置requires_grad False来冻结部分层 def set_parameter_requires_grad(model, feature_extracting):if feature_extracting:for param in model.parameters():param.requires_grad False2.3 实例 使用resnet34为例的将1000类改为10类但是仅改变最后一层的模型参数我们先冻结模型参数的梯度再对模型输出部分的全连接层进行修改 import torch import torch.nn.functional as F import torch.nn as nn from torch.optim.lr_scheduler import LambdaLR from torch.optim.lr_scheduler import StepLR import torchvision from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms from torch.utils.tensorboard import SummaryWriter import numpy as np import torchvision.models as models from torchinfo import summary#超参数定义 # 批次的大小 batch_size 16 #可选32、64、128 # 优化器的学习率 lr 1e-4 #运行epoch max_epochs 2 # 方案二使用“device”后续对要使用GPU的变量用.to(device)即可 device torch.device(cuda:1 if torch.cuda.is_available() else cpu) # 数据读取 #cifar10数据集为例给出构建Dataset类的方式 from torchvision import datasets#“data_transform”可以对图像进行一定的变换如翻转、裁剪、归一化等操作可自己定义 data_transformtransforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])train_cifar_dataset datasets.CIFAR10(cifar10,trainTrue, downloadFalse,transformdata_transform) test_cifar_dataset datasets.CIFAR10(cifar10,trainFalse, downloadFalse,transformdata_transform)#构建好Dataset后就可以使用DataLoader来按批次读入数据了 train_loader torch.utils.data.DataLoader(train_cifar_dataset, batch_sizebatch_size, num_workers4, shuffleTrue, drop_lastTrue)test_loader torch.utils.data.DataLoader(test_cifar_dataset, batch_sizebatch_size, num_workers4, shuffleFalse) # 下载预训练模型 restnet50 resnet34 models.resnet34(pretrainedTrue) print(resnet34)D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter pretrained is deprecated since 0.13 and may be removed in the future, please use weights instead.warnings.warn( D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or None for weights are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weightsResNet34_Weights.IMAGENET1K_V1. You can also use weightsResNet34_Weights.DEFAULT to get the most up-to-date weights.warnings.warn(msg) Downloading: https://download.pytorch.org/models/resnet34-b627a593.pth to C:\Users\xulele/.cache\torch\hub\checkpoints\resnet34-b627a593.pth 100%|██████████| 83.3M/83.3M [00:1000:00, 8.57MB/s]ResNet((conv1): Conv2d(3, 64, kernel_size(7, 7), stride(2, 2), padding(3, 3), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(maxpool): MaxPool2d(kernel_size3, stride2, padding1, dilation1, ceil_modeFalse)(layer1): Sequential((0): BasicBlock((conv1): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(1): BasicBlock((conv1): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(2): BasicBlock((conv1): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(64, 64, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(64, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(layer2): Sequential((0): BasicBlock((conv1): Conv2d(64, 128, kernel_size(3, 3), stride(2, 2), padding(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(downsample): Sequential((0): Conv2d(64, 128, kernel_size(1, 1), stride(2, 2), biasFalse)(1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(1): BasicBlock((conv1): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(2): BasicBlock((conv1): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(3): BasicBlock((conv1): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(128, 128, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(128, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(layer3): Sequential((0): BasicBlock((conv1): Conv2d(128, 256, kernel_size(3, 3), stride(2, 2), padding(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(downsample): Sequential((0): Conv2d(128, 256, kernel_size(1, 1), stride(2, 2), biasFalse)(1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(1): BasicBlock((conv1): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(2): BasicBlock((conv1): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(3): BasicBlock((conv1): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(4): BasicBlock((conv1): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(5): BasicBlock((conv1): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(256, 256, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(256, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(layer4): Sequential((0): BasicBlock((conv1): Conv2d(256, 512, kernel_size(3, 3), stride(2, 2), padding(1, 1), biasFalse)(bn1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(512, 512, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(downsample): Sequential((0): Conv2d(256, 512, kernel_size(1, 1), stride(2, 2), biasFalse)(1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(1): BasicBlock((conv1): Conv2d(512, 512, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(512, 512, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue))(2): BasicBlock((conv1): Conv2d(512, 512, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn1): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)(relu): ReLU(inplaceTrue)(conv2): Conv2d(512, 512, kernel_size(3, 3), stride(1, 1), padding(1, 1), biasFalse)(bn2): BatchNorm2d(512, eps1e-05, momentum0.1, affineTrue, track_running_statsTrue)))(avgpool): AdaptiveAvgPool2d(output_size(1, 1))(fc): Linear(in_features512, out_features1000, biasTrue) )#查看模型结构 summary(resnet34, (1, 3, 224, 224)) Layer (type:depth-idx) Output Shape Param #ResNet [1, 1000] -- ├─Conv2d: 1-1 [1, 64, 112, 112] 9,408 ├─BatchNorm2d: 1-2 [1, 64, 112, 112] 128 ├─ReLU: 1-3 [1, 64, 112, 112] -- ├─MaxPool2d: 1-4 [1, 64, 56, 56] -- ├─Sequential: 1-5 [1, 64, 56, 56] -- │ └─BasicBlock: 2-1 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-1 [1, 64, 56, 56] 36,864 │ │ └─BatchNorm2d: 3-2 [1, 64, 56, 56] 128 │ │ └─ReLU: 3-3 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-4 [1, 64, 56, 56] 36,864 │ │ └─BatchNorm2d: 3-5 [1, 64, 56, 56] 128 │ │ └─ReLU: 3-6 [1, 64, 56, 56] -- │ └─BasicBlock: 2-2 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-7 [1, 64, 56, 56] 36,864 │ │ └─BatchNorm2d: 3-8 [1, 64, 56, 56] 128 │ │ └─ReLU: 3-9 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-10 [1, 64, 56, 56] 36,864 │ │ └─BatchNorm2d: 3-11 [1, 64, 56, 56] 128 │ │ └─ReLU: 3-12 [1, 64, 56, 56] -- │ └─BasicBlock: 2-3 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-13 [1, 64, 56, 56] 36,864 │ │ └─BatchNorm2d: 3-14 [1, 64, 56, 56] 128 │ │ └─ReLU: 3-15 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-16 [1, 64, 56, 56] 36,864 │ │ └─BatchNorm2d: 3-17 [1, 64, 56, 56] 128 │ │ └─ReLU: 3-18 [1, 64, 56, 56] -- ├─Sequential: 1-6 [1, 128, 28, 28] -- │ └─BasicBlock: 2-4 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-19 [1, 128, 28, 28] 73,728 │ │ └─BatchNorm2d: 3-20 [1, 128, 28, 28] 256 │ │ └─ReLU: 3-21 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-22 [1, 128, 28, 28] 147,456 │ │ └─BatchNorm2d: 3-23 [1, 128, 28, 28] 256 │ │ └─Sequential: 3-24 [1, 128, 28, 28] 8,448 │ │ └─ReLU: 3-25 [1, 128, 28, 28] -- │ └─BasicBlock: 2-5 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-26 [1, 128, 28, 28] 147,456 │ │ └─BatchNorm2d: 3-27 [1, 128, 28, 28] 256 │ │ └─ReLU: 3-28 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-29 [1, 128, 28, 28] 147,456 │ │ └─BatchNorm2d: 3-30 [1, 128, 28, 28] 256 │ │ └─ReLU: 3-31 [1, 128, 28, 28] -- │ └─BasicBlock: 2-6 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-32 [1, 128, 28, 28] 147,456 │ │ └─BatchNorm2d: 3-33 [1, 128, 28, 28] 256 │ │ └─ReLU: 3-34 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-35 [1, 128, 28, 28] 147,456 │ │ └─BatchNorm2d: 3-36 [1, 128, 28, 28] 256 │ │ └─ReLU: 3-37 [1, 128, 28, 28] -- │ └─BasicBlock: 2-7 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-38 [1, 128, 28, 28] 147,456 │ │ └─BatchNorm2d: 3-39 [1, 128, 28, 28] 256 │ │ └─ReLU: 3-40 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-41 [1, 128, 28, 28] 147,456 │ │ └─BatchNorm2d: 3-42 [1, 128, 28, 28] 256 │ │ └─ReLU: 3-43 [1, 128, 28, 28] -- ├─Sequential: 1-7 [1, 256, 14, 14] -- │ └─BasicBlock: 2-8 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-44 [1, 256, 14, 14] 294,912 │ │ └─BatchNorm2d: 3-45 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-46 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-47 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-48 [1, 256, 14, 14] 512 │ │ └─Sequential: 3-49 [1, 256, 14, 14] 33,280 │ │ └─ReLU: 3-50 [1, 256, 14, 14] -- │ └─BasicBlock: 2-9 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-51 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-52 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-53 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-54 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-55 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-56 [1, 256, 14, 14] -- │ └─BasicBlock: 2-10 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-57 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-58 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-59 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-60 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-61 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-62 [1, 256, 14, 14] -- │ └─BasicBlock: 2-11 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-63 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-64 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-65 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-66 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-67 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-68 [1, 256, 14, 14] -- │ └─BasicBlock: 2-12 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-69 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-70 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-71 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-72 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-73 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-74 [1, 256, 14, 14] -- │ └─BasicBlock: 2-13 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-75 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-76 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-77 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-78 [1, 256, 14, 14] 589,824 │ │ └─BatchNorm2d: 3-79 [1, 256, 14, 14] 512 │ │ └─ReLU: 3-80 [1, 256, 14, 14] -- ├─Sequential: 1-8 [1, 512, 7, 7] -- │ └─BasicBlock: 2-14 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-81 [1, 512, 7, 7] 1,179,648 │ │ └─BatchNorm2d: 3-82 [1, 512, 7, 7] 1,024 │ │ └─ReLU: 3-83 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-84 [1, 512, 7, 7] 2,359,296 │ │ └─BatchNorm2d: 3-85 [1, 512, 7, 7] 1,024 │ │ └─Sequential: 3-86 [1, 512, 7, 7] 132,096 │ │ └─ReLU: 3-87 [1, 512, 7, 7] -- │ └─BasicBlock: 2-15 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-88 [1, 512, 7, 7] 2,359,296 │ │ └─BatchNorm2d: 3-89 [1, 512, 7, 7] 1,024 │ │ └─ReLU: 3-90 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-91 [1, 512, 7, 7] 2,359,296 │ │ └─BatchNorm2d: 3-92 [1, 512, 7, 7] 1,024 │ │ └─ReLU: 3-93 [1, 512, 7, 7] -- │ └─BasicBlock: 2-16 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-94 [1, 512, 7, 7] 2,359,296 │ │ └─BatchNorm2d: 3-95 [1, 512, 7, 7] 1,024 │ │ └─ReLU: 3-96 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-97 [1, 512, 7, 7] 2,359,296 │ │ └─BatchNorm2d: 3-98 [1, 512, 7, 7] 1,024 │ │ └─ReLU: 3-99 [1, 512, 7, 7] -- ├─AdaptiveAvgPool2d: 1-9 [1, 512, 1, 1] -- ├─Linear: 1-10 [1, 1000] 513,000Total params: 21,797,672 Trainable params: 21,797,672 Non-trainable params: 0 Total mult-adds (G): 3.66Input size (MB): 0.60 Forward/backward pass size (MB): 59.82 Params size (MB): 87.19 Estimated Total Size (MB): 147.61#检测 模型准确率 def cal_predict_correct(model):test_total_correct 0for iter,(images,labels) in enumerate(test_loader):images images.to(device)labels labels.to(device)outputs model(images)test_total_correct (outputs.argmax(1) labels).sum().item() # print(test_total_correct: str(test_total_correct))return test_total_correcttotal_correct cal_predict_correct(resnet34) print(test_total_correct: str(test_total_correct / 10000))test_total_correct: 0.1def set_parameter_requires_grad(model, feature_extracting):if feature_extracting:for param in model.parameters():param.requires_grad False# 冻结参数的梯度 feature_extract True new_model resnet34 set_parameter_requires_grad(new_model, feature_extract)# 修改模型 #训练过程中model仍会进行梯度回传但是参数更新则只会发生在fc层 num_ftrs new_model.fc.in_features new_model.fc nn.Linear(in_featuresnum_ftrs, out_features10, biasTrue) summary(new_model, (1, 3, 224, 224)) Layer (type:depth-idx) Output Shape Param #ResNet [1, 10] -- ├─Conv2d: 1-1 [1, 64, 112, 112] (9,408) ├─BatchNorm2d: 1-2 [1, 64, 112, 112] (128) ├─ReLU: 1-3 [1, 64, 112, 112] -- ├─MaxPool2d: 1-4 [1, 64, 56, 56] -- ├─Sequential: 1-5 [1, 64, 56, 56] -- │ └─BasicBlock: 2-1 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-1 [1, 64, 56, 56] (36,864) │ │ └─BatchNorm2d: 3-2 [1, 64, 56, 56] (128) │ │ └─ReLU: 3-3 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-4 [1, 64, 56, 56] (36,864) │ │ └─BatchNorm2d: 3-5 [1, 64, 56, 56] (128) │ │ └─ReLU: 3-6 [1, 64, 56, 56] -- │ └─BasicBlock: 2-2 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-7 [1, 64, 56, 56] (36,864) │ │ └─BatchNorm2d: 3-8 [1, 64, 56, 56] (128) │ │ └─ReLU: 3-9 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-10 [1, 64, 56, 56] (36,864) │ │ └─BatchNorm2d: 3-11 [1, 64, 56, 56] (128) │ │ └─ReLU: 3-12 [1, 64, 56, 56] -- │ └─BasicBlock: 2-3 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-13 [1, 64, 56, 56] (36,864) │ │ └─BatchNorm2d: 3-14 [1, 64, 56, 56] (128) │ │ └─ReLU: 3-15 [1, 64, 56, 56] -- │ │ └─Conv2d: 3-16 [1, 64, 56, 56] (36,864) │ │ └─BatchNorm2d: 3-17 [1, 64, 56, 56] (128) │ │ └─ReLU: 3-18 [1, 64, 56, 56] -- ├─Sequential: 1-6 [1, 128, 28, 28] -- │ └─BasicBlock: 2-4 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-19 [1, 128, 28, 28] (73,728) │ │ └─BatchNorm2d: 3-20 [1, 128, 28, 28] (256) │ │ └─ReLU: 3-21 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-22 [1, 128, 28, 28] (147,456) │ │ └─BatchNorm2d: 3-23 [1, 128, 28, 28] (256) │ │ └─Sequential: 3-24 [1, 128, 28, 28] (8,448) │ │ └─ReLU: 3-25 [1, 128, 28, 28] -- │ └─BasicBlock: 2-5 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-26 [1, 128, 28, 28] (147,456) │ │ └─BatchNorm2d: 3-27 [1, 128, 28, 28] (256) │ │ └─ReLU: 3-28 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-29 [1, 128, 28, 28] (147,456) │ │ └─BatchNorm2d: 3-30 [1, 128, 28, 28] (256) │ │ └─ReLU: 3-31 [1, 128, 28, 28] -- │ └─BasicBlock: 2-6 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-32 [1, 128, 28, 28] (147,456) │ │ └─BatchNorm2d: 3-33 [1, 128, 28, 28] (256) │ │ └─ReLU: 3-34 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-35 [1, 128, 28, 28] (147,456) │ │ └─BatchNorm2d: 3-36 [1, 128, 28, 28] (256) │ │ └─ReLU: 3-37 [1, 128, 28, 28] -- │ └─BasicBlock: 2-7 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-38 [1, 128, 28, 28] (147,456) │ │ └─BatchNorm2d: 3-39 [1, 128, 28, 28] (256) │ │ └─ReLU: 3-40 [1, 128, 28, 28] -- │ │ └─Conv2d: 3-41 [1, 128, 28, 28] (147,456) │ │ └─BatchNorm2d: 3-42 [1, 128, 28, 28] (256) │ │ └─ReLU: 3-43 [1, 128, 28, 28] -- ├─Sequential: 1-7 [1, 256, 14, 14] -- │ └─BasicBlock: 2-8 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-44 [1, 256, 14, 14] (294,912) │ │ └─BatchNorm2d: 3-45 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-46 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-47 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-48 [1, 256, 14, 14] (512) │ │ └─Sequential: 3-49 [1, 256, 14, 14] (33,280) │ │ └─ReLU: 3-50 [1, 256, 14, 14] -- │ └─BasicBlock: 2-9 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-51 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-52 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-53 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-54 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-55 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-56 [1, 256, 14, 14] -- │ └─BasicBlock: 2-10 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-57 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-58 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-59 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-60 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-61 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-62 [1, 256, 14, 14] -- │ └─BasicBlock: 2-11 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-63 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-64 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-65 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-66 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-67 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-68 [1, 256, 14, 14] -- │ └─BasicBlock: 2-12 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-69 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-70 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-71 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-72 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-73 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-74 [1, 256, 14, 14] -- │ └─BasicBlock: 2-13 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-75 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-76 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-77 [1, 256, 14, 14] -- │ │ └─Conv2d: 3-78 [1, 256, 14, 14] (589,824) │ │ └─BatchNorm2d: 3-79 [1, 256, 14, 14] (512) │ │ └─ReLU: 3-80 [1, 256, 14, 14] -- ├─Sequential: 1-8 [1, 512, 7, 7] -- │ └─BasicBlock: 2-14 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-81 [1, 512, 7, 7] (1,179,648) │ │ └─BatchNorm2d: 3-82 [1, 512, 7, 7] (1,024) │ │ └─ReLU: 3-83 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-84 [1, 512, 7, 7] (2,359,296) │ │ └─BatchNorm2d: 3-85 [1, 512, 7, 7] (1,024) │ │ └─Sequential: 3-86 [1, 512, 7, 7] (132,096) │ │ └─ReLU: 3-87 [1, 512, 7, 7] -- │ └─BasicBlock: 2-15 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-88 [1, 512, 7, 7] (2,359,296) │ │ └─BatchNorm2d: 3-89 [1, 512, 7, 7] (1,024) │ │ └─ReLU: 3-90 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-91 [1, 512, 7, 7] (2,359,296) │ │ └─BatchNorm2d: 3-92 [1, 512, 7, 7] (1,024) │ │ └─ReLU: 3-93 [1, 512, 7, 7] -- │ └─BasicBlock: 2-16 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-94 [1, 512, 7, 7] (2,359,296) │ │ └─BatchNorm2d: 3-95 [1, 512, 7, 7] (1,024) │ │ └─ReLU: 3-96 [1, 512, 7, 7] -- │ │ └─Conv2d: 3-97 [1, 512, 7, 7] (2,359,296) │ │ └─BatchNorm2d: 3-98 [1, 512, 7, 7] (1,024) │ │ └─ReLU: 3-99 [1, 512, 7, 7] -- ├─AdaptiveAvgPool2d: 1-9 [1, 512, 1, 1] -- ├─Linear: 1-10 [1, 10] 5,130Total params: 21,289,802 Trainable params: 5,130 Non-trainable params: 21,284,672 Total mult-adds (G): 3.66Input size (MB): 0.60 Forward/backward pass size (MB): 59.81 Params size (MB): 85.16 Estimated Total Size (MB): 145.57#训练验证 Resnet34_new new_model.to(device) # 定义损失函数和优化器 device torch.device(cuda:0 if torch.cuda.is_available() else cpu) # 损失函数自定义损失函数 criterion nn.CrossEntropyLoss() # 优化器 optimizer torch.optim.Adam(Resnet50_new.parameters(), lrlr) epoch max_epochstotal_step len(train_loader) train_all_loss [] test_all_loss []for i in range(epoch):Resnet34_new.train()train_total_loss 0train_total_num 0train_total_correct 0for iter, (images,labels) in enumerate(train_loader):images images.to(device)labels labels.to(device)outputs Resnet34_new(images)loss criterion(outputs,labels)train_total_correct (outputs.argmax(1) labels).sum().item()#backwordoptimizer.zero_grad()loss.backward()optimizer.step()train_total_num labels.shape[0]train_total_loss loss.item()print(Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}.format(i1,epoch,iter1,total_step,loss.item()/labels.shape[0]))Resnet34_new.eval()test_total_loss 0test_total_correct 0test_total_num 0for iter,(images,labels) in enumerate(test_loader):images images.to(device)labels labels.to(device)outputs Resnet34_new(images)loss criterion(outputs,labels)test_total_correct (outputs.argmax(1) labels).sum().item()test_total_loss loss.item()test_total_num labels.shape[0]print(Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%.format(i1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100))train_all_loss.append(np.round(train_total_loss / train_total_num,4))test_all_loss.append(np.round(test_total_loss / test_total_num,4)) Epoch [1/2], Iter [1/3125], train_loss:0.150127 Epoch [1/2], Iter [2/3125], train_loss:0.174470 Epoch [1/2], Iter [3/3125], train_loss:0.165727 Epoch [1/2], Iter [4/3125], train_loss:0.174811 Epoch [1/2], Iter [5/3125], train_loss:0.158658 Epoch [1/2], Iter [6/3125], train_loss:0.153260 Epoch [1/2], Iter [7/3125], train_loss:0.164495 Epoch [1/2], Iter [8/3125], train_loss:0.164485 Epoch [1/2], Iter [9/3125], train_loss:0.157202 Epoch [1/2], Iter [10/3125], train_loss:0.149555 Epoch [1/2], Iter [11/3125], train_loss:0.172609 Epoch [1/2], Iter [12/3125], train_loss:0.180861 Epoch [1/2], Iter [13/3125], train_loss:0.156719 Epoch [1/2], Iter [14/3125], train_loss:0.172375 Epoch [1/2], Iter [15/3125], train_loss:0.169886 Epoch [1/2], Iter [16/3125], train_loss:0.148726 Epoch [1/2], Iter [17/3125], train_loss:0.160391 Epoch [1/2], Iter [18/3125], train_loss:0.160285 Epoch [1/2], Iter [19/3125], train_loss:0.167672 Epoch [1/2], Iter [20/3125], train_loss:0.151213 Epoch [1/2], Iter [21/3125], train_loss:0.154690 Epoch [1/2], Iter [22/3125], train_loss:0.155165 Epoch [1/2], Iter [23/3125], train_loss:0.162777 Epoch [1/2], Iter [24/3125], train_loss:0.169136 Epoch [1/2], Iter [25/3125], train_loss:0.151533 Epoch [1/2], Iter [26/3125], train_loss:0.168992 Epoch [1/2], Iter [27/3125], train_loss:0.176258 Epoch [1/2], Iter [28/3125], train_loss:0.162240 Epoch [1/2], Iter [29/3125], train_loss:0.161768 Epoch [1/2], Iter [30/3125], train_loss:0.165359 Epoch [1/2], Iter [31/3125], train_loss:0.166174 Epoch [1/2], Iter [32/3125], train_loss:0.173654 Epoch [1/2], Iter [33/3125], train_loss:0.162488 Epoch [1/2], Iter [34/3125], train_loss:0.164815 Epoch [1/2], Iter [35/3125], train_loss:0.154411 Epoch [1/2], Iter [36/3125], train_loss:0.159386 Epoch [1/2], Iter [37/3125], train_loss:0.176261 Epoch [1/2], Iter [38/3125], train_loss:0.163848 Epoch [1/2], Iter [39/3125], train_loss:0.174402 Epoch [1/2], Iter [40/3125], train_loss:0.178917 Epoch [1/2], Iter [41/3125], train_loss:0.149938 Epoch [1/2], Iter [42/3125], train_loss:0.156186 Epoch [1/2], Iter [43/3125], train_loss:0.162950 Epoch [1/2], Iter [44/3125], train_loss:0.169058 Epoch [1/2], Iter [45/3125], train_loss:0.168587 Epoch [1/2], Iter [46/3125], train_loss:0.173754 Epoch [1/2], Iter [47/3125], train_loss:0.158612 Epoch [1/2], Iter [48/3125], train_loss:0.163891 Epoch [1/2], Iter [49/3125], train_loss:0.149220 Epoch [1/2], Iter [50/3125], train_loss:0.175387 Epoch [1/2], Iter [51/3125], train_loss:0.163082 Epoch [1/2], Iter [52/3125], train_loss:0.156597 Epoch [1/2], Iter [53/3125], train_loss:0.179248 Epoch [1/2], Iter [54/3125], train_loss:0.170053 Epoch [1/2], Iter [55/3125], train_loss:0.140899 Epoch [1/2], Iter [56/3125], train_loss:0.168686 Epoch [1/2], Iter [57/3125], train_loss:0.189548 Epoch [1/2], Iter [58/3125], train_loss:0.169847 Epoch [1/2], Iter [59/3125], train_loss:0.171854 Epoch [1/2], Iter [60/3125], train_loss:0.175660 Epoch [1/2], Iter [61/3125], train_loss:0.163686 Epoch [1/2], Iter [62/3125], train_loss:0.174950 Epoch [1/2], Iter [63/3125], train_loss:0.173237 Epoch [1/2], Iter [64/3125], train_loss:0.146743 Epoch [1/2], Iter [65/3125], train_loss:0.159798 Epoch [1/2], Iter [66/3125], train_loss:0.169616 Epoch [1/2], Iter [67/3125], train_loss:0.167541 Epoch [1/2], Iter [68/3125], train_loss:0.136470 Epoch [1/2], Iter [69/3125], train_loss:0.185080 Epoch [1/2], Iter [70/3125], train_loss:0.166373 Epoch [1/2], Iter [71/3125], train_loss:0.160634 Epoch [1/2], Iter [72/3125], train_loss:0.163522 Epoch [1/2], Iter [73/3125], train_loss:0.157858 Epoch [1/2], Iter [74/3125], train_loss:0.157069 Epoch [1/2], Iter [75/3125], train_loss:0.183969 Epoch [1/2], Iter [76/3125], train_loss:0.166041 Epoch [1/2], Iter [77/3125], train_loss:0.151215 Epoch [1/2], Iter [78/3125], train_loss:0.164155 Epoch [1/2], Iter [79/3125], train_loss:0.158990 Epoch [1/2], Iter [80/3125], train_loss:0.178859 Epoch [1/2], Iter [81/3125], train_loss:0.139378 Epoch [1/2], Iter [82/3125], train_loss:0.150422 Epoch [1/2], Iter [83/3125], train_loss:0.155447 Epoch [1/2], Iter [84/3125], train_loss:0.146703 Epoch [1/2], Iter [85/3125], train_loss:0.165099 Epoch [1/2], Iter [86/3125], train_loss:0.175539 Epoch [1/2], Iter [87/3125], train_loss:0.178613 Epoch [1/2], Iter [88/3125], train_loss:0.169430 Epoch [1/2], Iter [89/3125], train_loss:0.160620 Epoch [1/2], Iter [90/3125], train_loss:0.172726 Epoch [1/2], Iter [91/3125], train_loss:0.139834 Epoch [1/2], Iter [92/3125], train_loss:0.162758 Epoch [1/2], Iter [93/3125], train_loss:0.160110 Epoch [1/2], Iter [94/3125], train_loss:0.176203 Epoch [1/2], Iter [95/3125], train_loss:0.170835 Epoch [1/2], Iter [96/3125], train_loss:0.166727 Epoch [1/2], Iter [97/3125], train_loss:0.175421 Epoch [1/2], Iter [98/3125], train_loss:0.173413 Epoch [1/2], Iter [99/3125], train_loss:0.154259 Epoch [1/2], Iter [100/3125], train_loss:0.146670 Epoch [1/2], Iter [101/3125], train_loss:0.161012 Epoch [1/2], Iter [102/3125], train_loss:0.151979 Epoch [1/2], Iter [103/3125], train_loss:0.163212 Epoch [1/2], Iter [104/3125], train_loss:0.174235 Epoch [1/2], Iter [105/3125], train_loss:0.152968 Epoch [1/2], Iter [106/3125], train_loss:0.156215 Epoch [1/2], Iter [107/3125], train_loss:0.164557 Epoch [1/2], Iter [108/3125], train_loss:0.144438 Epoch [1/2], Iter [109/3125], train_loss:0.168143 Epoch [1/2], Iter [110/3125], train_loss:0.144444 Epoch [1/2], Iter [111/3125], train_loss:0.153808 Epoch [1/2], Iter [112/3125], train_loss:0.172484 Epoch [1/2], Iter [113/3125], train_loss:0.168573 Epoch [1/2], Iter [114/3125], train_loss:0.157955 Epoch [1/2], Iter [115/3125], train_loss:0.170679 Epoch [1/2], Iter [116/3125], train_loss:0.150308 Epoch [1/2], Iter [117/3125], train_loss:0.152166 Epoch [1/2], Iter [118/3125], train_loss:0.175642 Epoch [1/2], Iter [119/3125], train_loss:0.167162 Epoch [1/2], Iter [120/3125], train_loss:0.159675 Epoch [1/2], Iter [121/3125], train_loss:0.176089 Epoch [1/2], Iter [122/3125], train_loss:0.154275 Epoch [1/2], Iter [123/3125], train_loss:0.139308 Epoch [1/2], Iter [124/3125], train_loss:0.156106 Epoch [1/2], Iter [125/3125], train_loss:0.140437 Epoch [1/2], Iter [126/3125], train_loss:0.154971 Epoch [1/2], Iter [127/3125], train_loss:0.148948 Epoch [1/2], Iter [128/3125], train_loss:0.173654 Epoch [1/2], Iter [129/3125], train_loss:0.175725 Epoch [1/2], Iter [130/3125], train_loss:0.160516 Epoch [1/2], Iter [131/3125], train_loss:0.170737 Epoch [1/2], Iter [132/3125], train_loss:0.161662 Epoch [1/2], Iter [133/3125], train_loss:0.179921 Epoch [1/2], Iter [134/3125], train_loss:0.160738 Epoch [1/2], Iter [135/3125], train_loss:0.134471 Epoch [1/2], Iter [136/3125], train_loss:0.170317 Epoch [1/2], Iter [137/3125], train_loss:0.153042 Epoch [1/2], Iter [138/3125], train_loss:0.163370 Epoch [1/2], Iter [139/3125], train_loss:0.169346 Epoch [1/2], Iter [140/3125], train_loss:0.156637 Epoch [1/2], Iter [141/3125], train_loss:0.164446 Epoch [1/2], Iter [142/3125], train_loss:0.166337 Epoch [1/2], Iter [143/3125], train_loss:0.150206 Epoch [1/2], Iter [144/3125], train_loss:0.156191 Epoch [1/2], Iter [145/3125], train_loss:0.169191 Epoch [1/2], Iter [146/3125], train_loss:0.165300 Epoch [1/2], Iter [147/3125], train_loss:0.177487 Epoch [1/2], Iter [148/3125], train_loss:0.179514 Epoch [1/2], Iter [149/3125], train_loss:0.153518 Epoch [1/2], Iter [150/3125], train_loss:0.155025 Epoch [1/2], Iter [151/3125], train_loss:0.169826 Epoch [1/2], Iter [152/3125], train_loss:0.150576 Epoch [1/2], Iter [153/3125], train_loss:0.149755 Epoch [1/2], Iter [154/3125], train_loss:0.156437 Epoch [1/2], Iter [155/3125], train_loss:0.163630 Epoch [1/2], Iter [156/3125], train_loss:0.163358 Epoch [1/2], Iter [157/3125], train_loss:0.168501 Epoch [1/2], Iter [158/3125], train_loss:0.152938 Epoch [1/2], Iter [159/3125], train_loss:0.162743 Epoch [1/2], Iter [160/3125], train_loss:0.164684 Epoch [1/2], Iter [161/3125], train_loss:0.134906 Epoch [1/2], Iter [162/3125], train_loss:0.171217 Epoch [1/2], Iter [163/3125], train_loss:0.166338 Epoch [1/2], Iter [164/3125], train_loss:0.173403 Epoch [1/2], Iter [165/3125], train_loss:0.166951 Epoch [1/2], Iter [166/3125], train_loss:0.161986 Epoch [1/2], Iter [167/3125], train_loss:0.167642 Epoch [1/2], Iter [168/3125], train_loss:0.163133 Epoch [1/2], Iter [169/3125], train_loss:0.176087 Epoch [1/2], Iter [170/3125], train_loss:0.181500 Epoch [1/2], Iter [171/3125], train_loss:0.182332 Epoch [1/2], Iter [172/3125], train_loss:0.159162 Epoch [1/2], Iter [173/3125], train_loss:0.173818 Epoch [1/2], Iter [174/3125], train_loss:0.151095 Epoch [1/2], Iter [175/3125], train_loss:0.169016 Epoch [1/2], Iter [176/3125], train_loss:0.168345 Epoch [1/2], Iter [177/3125], train_loss:0.171198 Epoch [1/2], Iter [178/3125], train_loss:0.158377 Epoch [1/2], Iter [179/3125], train_loss:0.150349 Epoch [1/2], Iter [180/3125], train_loss:0.154732 Epoch [1/2], Iter [181/3125], train_loss:0.159255 Epoch [1/2], Iter [182/3125], train_loss:0.180752 Epoch [1/2], Iter [183/3125], train_loss:0.130398 Epoch [1/2], Iter [184/3125], train_loss:0.149835 Epoch [1/2], Iter [185/3125], train_loss:0.163545 Epoch [1/2], Iter [186/3125], train_loss:0.165769 Epoch [1/2], Iter [187/3125], train_loss:0.165499 Epoch [1/2], Iter [188/3125], train_loss:0.191183 Epoch [1/2], Iter [189/3125], train_loss:0.165406 Epoch [1/2], Iter [190/3125], train_loss:0.158130 Epoch [1/2], Iter [191/3125], train_loss:0.167049 Epoch [1/2], Iter [192/3125], train_loss:0.158406 Epoch [1/2], Iter [193/3125], train_loss:0.155791 Epoch [1/2], Iter [194/3125], train_loss:0.154068 Epoch [1/2], Iter [195/3125], train_loss:0.173929 Epoch [1/2], Iter [196/3125], train_loss:0.166356 Epoch [1/2], Iter [197/3125], train_loss:0.153073 Epoch [1/2], Iter [198/3125], train_loss:0.159932 Epoch [1/2], Iter [199/3125], train_loss:0.158823 Epoch [1/2], Iter [200/3125], train_loss:0.187810 Epoch [1/2], Iter [201/3125], train_loss:0.178415 Epoch [1/2], Iter [202/3125], train_loss:0.156469 Epoch [1/2], Iter [203/3125], train_loss:0.160102 Epoch [1/2], Iter [204/3125], train_loss:0.147824 Epoch [1/2], Iter [205/3125], train_loss:0.159959 Epoch [1/2], Iter [206/3125], train_loss:0.168457 Epoch [1/2], Iter [207/3125], train_loss:0.152751 Epoch [1/2], Iter [208/3125], train_loss:0.153071 Epoch [1/2], Iter [209/3125], train_loss:0.162002 Epoch [1/2], Iter [210/3125], train_loss:0.177490 Epoch [1/2], Iter [211/3125], train_loss:0.153973 Epoch [1/2], Iter [212/3125], train_loss:0.178655 Epoch [1/2], Iter [213/3125], train_loss:0.172759 Epoch [1/2], Iter [214/3125], train_loss:0.161288 Epoch [1/2], Iter [215/3125], train_loss:0.145693 Epoch [1/2], Iter [216/3125], train_loss:0.149355 Epoch [1/2], Iter [217/3125], train_loss:0.177612 Epoch [1/2], Iter [218/3125], train_loss:0.156104 Epoch [1/2], Iter [219/3125], train_loss:0.146696 Epoch [1/2], Iter [220/3125], train_loss:0.168620 Epoch [1/2], Iter [221/3125], train_loss:0.134316 Epoch [1/2], Iter [222/3125], train_loss:0.164465 Epoch [1/2], Iter [223/3125], train_loss:0.161020 Epoch [1/2], Iter [224/3125], train_loss:0.144464 Epoch [1/2], Iter [225/3125], train_loss:0.145501 Epoch [1/2], Iter [226/3125], train_loss:0.156721 Epoch [1/2], Iter [227/3125], train_loss:0.160348 Epoch [1/2], Iter [228/3125], train_loss:0.157792 Epoch [1/2], Iter [229/3125], train_loss:0.143886 Epoch [1/2], Iter [230/3125], train_loss:0.146231 Epoch [1/2], Iter [231/3125], train_loss:0.161353 Epoch [1/2], Iter [232/3125], train_loss:0.172967 Epoch [1/2], Iter [233/3125], train_loss:0.173051 Epoch [1/2], Iter [234/3125], train_loss:0.173887 Epoch [1/2], Iter [235/3125], train_loss:0.155447 Epoch [1/2], Iter [236/3125], train_loss:0.162683 Epoch [1/2], Iter [237/3125], train_loss:0.147682 Epoch [1/2], Iter [238/3125], train_loss:0.170582 Epoch [1/2], Iter [239/3125], train_loss:0.159764 Epoch [1/2], Iter [240/3125], train_loss:0.157225 Epoch [1/2], Iter [241/3125], train_loss:0.153664 Epoch [1/2], Iter [242/3125], train_loss:0.166018 Epoch [1/2], Iter [243/3125], train_loss:0.175373 Epoch [1/2], Iter [244/3125], train_loss:0.146529 Epoch [1/2], Iter [245/3125], train_loss:0.166091 Epoch [1/2], Iter [246/3125], train_loss:0.161189 Epoch [1/2], Iter [247/3125], train_loss:0.144563 Epoch [1/2], Iter [248/3125], train_loss:0.150318 Epoch [1/2], Iter [249/3125], train_loss:0.140906 Epoch [1/2], Iter [250/3125], train_loss:0.169033 Epoch [1/2], Iter [251/3125], train_loss:0.155781 Epoch [1/2], Iter [252/3125], train_loss:0.163493 Epoch [1/2], Iter [253/3125], train_loss:0.153378 Epoch [1/2], Iter [254/3125], train_loss:0.183447 Epoch [1/2], Iter [255/3125], train_loss:0.178129 Epoch [1/2], Iter [256/3125], train_loss:0.177007 Epoch [1/2], Iter [257/3125], train_loss:0.179591 Epoch [1/2], Iter [258/3125], train_loss:0.169509 Epoch [1/2], Iter [259/3125], train_loss:0.146213 Epoch [1/2], Iter [260/3125], train_loss:0.171849 Epoch [1/2], Iter [261/3125], train_loss:0.163851 Epoch [1/2], Iter [262/3125], train_loss:0.178366 Epoch [1/2], Iter [263/3125], train_loss:0.194072 Epoch [1/2], Iter [264/3125], train_loss:0.172418 Epoch [1/2], Iter [265/3125], train_loss:0.143541 Epoch [1/2], Iter [266/3125], train_loss:0.158418 Epoch [1/2], Iter [267/3125], train_loss:0.163535 Epoch [1/2], Iter [268/3125], train_loss:0.171397 Epoch [1/2], Iter [269/3125], train_loss:0.183410 Epoch [1/2], Iter [270/3125], train_loss:0.191745 Epoch [1/2], Iter [271/3125], train_loss:0.195354 Epoch [1/2], Iter [272/3125], train_loss:0.166208 Epoch [1/2], Iter [273/3125], train_loss:0.148297 Epoch [1/2], Iter [274/3125], train_loss:0.164007 Epoch [1/2], Iter [275/3125], train_loss:0.168109 Epoch [1/2], Iter [276/3125], train_loss:0.187189 Epoch [1/2], Iter [277/3125], train_loss:0.171387 Epoch [1/2], Iter [278/3125], train_loss:0.143764 Epoch [1/2], Iter [279/3125], train_loss:0.175919 Epoch [1/2], Iter [280/3125], train_loss:0.172834 Epoch [1/2], Iter [281/3125], train_loss:0.173480 Epoch [1/2], Iter [282/3125], train_loss:0.141544 Epoch [1/2], Iter [283/3125], train_loss:0.187073 Epoch [1/2], Iter [284/3125], train_loss:0.147416 Epoch [1/2], Iter [285/3125], train_loss:0.163346 Epoch [1/2], Iter [286/3125], train_loss:0.155601 Epoch [1/2], Iter [287/3125], train_loss:0.160135 Epoch [1/2], Iter [288/3125], train_loss:0.153201 Epoch [1/2], Iter [289/3125], train_loss:0.157078 Epoch [1/2], Iter [290/3125], train_loss:0.143863 Epoch [1/2], Iter [291/3125], train_loss:0.170847 Epoch [1/2], Iter [292/3125], train_loss:0.160009 Epoch [1/2], Iter [293/3125], train_loss:0.160868 Epoch [1/2], Iter [294/3125], train_loss:0.159037 Epoch [1/2], Iter [295/3125], train_loss:0.148768 Epoch [1/2], Iter [296/3125], train_loss:0.172005 Epoch [1/2], Iter [297/3125], train_loss:0.170369 Epoch [1/2], Iter [298/3125], train_loss:0.150250 Epoch [1/2], Iter [299/3125], train_loss:0.203501 Epoch [1/2], Iter [300/3125], train_loss:0.172398 Epoch [1/2], Iter [301/3125], train_loss:0.184601 Epoch [1/2], Iter [302/3125], train_loss:0.156175 Epoch [1/2], Iter [303/3125], train_loss:0.161752 Epoch [1/2], Iter [304/3125], train_loss:0.154050 Epoch [1/2], Iter [305/3125], train_loss:0.151905 Epoch [1/2], Iter [306/3125], train_loss:0.154861 Epoch [1/2], Iter [307/3125], train_loss:0.157530 Epoch [1/2], Iter [308/3125], train_loss:0.162054 Epoch [1/2], Iter [309/3125], train_loss:0.172370 Epoch [1/2], Iter [310/3125], train_loss:0.149971 Epoch [1/2], Iter [311/3125], train_loss:0.155449 Epoch [1/2], Iter [312/3125], train_loss:0.168246 Epoch [1/2], Iter [313/3125], train_loss:0.161156 Epoch [1/2], Iter [314/3125], train_loss:0.182064 Epoch [1/2], Iter [315/3125], train_loss:0.168014 Epoch [1/2], Iter [316/3125], train_loss:0.155707 Epoch [1/2], Iter [317/3125], train_loss:0.155345 Epoch [1/2], Iter [318/3125], train_loss:0.157537 Epoch [1/2], Iter [319/3125], train_loss:0.158657 Epoch [1/2], Iter [320/3125], train_loss:0.162647 Epoch [1/2], Iter [321/3125], train_loss:0.165201 Epoch [1/2], Iter [322/3125], train_loss:0.187565 Epoch [1/2], Iter [323/3125], train_loss:0.153937 Epoch [1/2], Iter [324/3125], train_loss:0.147520 Epoch [1/2], Iter [325/3125], train_loss:0.139758 Epoch [1/2], Iter [326/3125], train_loss:0.177869 Epoch [1/2], Iter [327/3125], train_loss:0.178201 Epoch [1/2], Iter [328/3125], train_loss:0.154316 Epoch [1/2], Iter [329/3125], train_loss:0.178173 Epoch [1/2], Iter [330/3125], train_loss:0.159244 Epoch [1/2], Iter [331/3125], train_loss:0.177582 Epoch [1/2], Iter [332/3125], train_loss:0.153592 Epoch [1/2], Iter [333/3125], train_loss:0.154490 Epoch [1/2], Iter [334/3125], train_loss:0.150733 Epoch [1/2], Iter [335/3125], train_loss:0.169697 Epoch [1/2], Iter [336/3125], train_loss:0.155575 Epoch [1/2], Iter [337/3125], train_loss:0.158214 Epoch [1/2], Iter [338/3125], train_loss:0.174536 Epoch [1/2], Iter [339/3125], train_loss:0.139395 Epoch [1/2], Iter [340/3125], train_loss:0.163447 Epoch [1/2], Iter [341/3125], train_loss:0.146871 Epoch [1/2], Iter [342/3125], train_loss:0.160089 Epoch [1/2], Iter [343/3125], train_loss:0.161521 Epoch [1/2], Iter [344/3125], train_loss:0.148263 Epoch [1/2], Iter [345/3125], train_loss:0.156887 Epoch [1/2], Iter [346/3125], train_loss:0.163093 Epoch [1/2], Iter [347/3125], train_loss:0.130156 Epoch [1/2], Iter [348/3125], train_loss:0.153562 Epoch [1/2], Iter [349/3125], train_loss:0.183320 Epoch [1/2], Iter [350/3125], train_loss:0.151159 Epoch [1/2], Iter [351/3125], train_loss:0.144421 Epoch [1/2], Iter [352/3125], train_loss:0.145968 Epoch [1/2], Iter [353/3125], train_loss:0.150598 Epoch [1/2], Iter [354/3125], train_loss:0.163271 Epoch [1/2], Iter [355/3125], train_loss:0.191171 Epoch [1/2], Iter [356/3125], train_loss:0.166442 Epoch [1/2], Iter [357/3125], train_loss:0.153268 Epoch [1/2], Iter [358/3125], train_loss:0.160086 Epoch [1/2], Iter [359/3125], train_loss:0.172394 Epoch [1/2], Iter [360/3125], train_loss:0.160697 Epoch [1/2], Iter [361/3125], train_loss:0.158556 Epoch [1/2], Iter [362/3125], train_loss:0.148141 Epoch [1/2], Iter [363/3125], train_loss:0.161616 Epoch [1/2], Iter [364/3125], train_loss:0.164506 Epoch [1/2], Iter [365/3125], train_loss:0.153889 Epoch [1/2], Iter [366/3125], train_loss:0.149990 Epoch [1/2], Iter [367/3125], train_loss:0.172651 Epoch [1/2], Iter [368/3125], train_loss:0.167421 Epoch [1/2], Iter [369/3125], train_loss:0.157874 Epoch [1/2], Iter [370/3125], train_loss:0.175726 Epoch [1/2], Iter [371/3125], train_loss:0.168166 Epoch [1/2], Iter [372/3125], train_loss:0.160632 Epoch [1/2], Iter [373/3125], train_loss:0.169915 Epoch [1/2], Iter [374/3125], train_loss:0.141351 Epoch [1/2], Iter [375/3125], train_loss:0.157579 Epoch [1/2], Iter [376/3125], train_loss:0.159373 Epoch [1/2], Iter [377/3125], train_loss:0.173719 Epoch [1/2], Iter [378/3125], train_loss:0.156862 Epoch [1/2], Iter [379/3125], train_loss:0.164567 Epoch [1/2], Iter [380/3125], train_loss:0.151420 Epoch [1/2], Iter [381/3125], train_loss:0.155565 Epoch [1/2], Iter [382/3125], train_loss:0.156861 Epoch [1/2], Iter [383/3125], train_loss:0.162360 Epoch [1/2], Iter [384/3125], train_loss:0.155612 Epoch [1/2], Iter [385/3125], train_loss:0.187500 Epoch [1/2], Iter [386/3125], train_loss:0.167519 Epoch [1/2], Iter [387/3125], train_loss:0.150314 Epoch [1/2], Iter [388/3125], train_loss:0.171371 Epoch [1/2], Iter [389/3125], train_loss:0.170002 Epoch [1/2], Iter [390/3125], train_loss:0.171281 Epoch [1/2], Iter [391/3125], train_loss:0.154229 Epoch [1/2], Iter [392/3125], train_loss:0.152277 Epoch [1/2], Iter [393/3125], train_loss:0.160335 Epoch [1/2], Iter [394/3125], train_loss:0.160123 Epoch [1/2], Iter [395/3125], train_loss:0.157730 Epoch [1/2], Iter [396/3125], train_loss:0.148626 Epoch [1/2], Iter [397/3125], train_loss:0.164090 Epoch [1/2], Iter [398/3125], train_loss:0.181123 Epoch [1/2], Iter [399/3125], train_loss:0.144987 Epoch [1/2], Iter [400/3125], train_loss:0.147743 Epoch [1/2], Iter [401/3125], train_loss:0.156141 Epoch [1/2], Iter [402/3125], train_loss:0.182602 Epoch [1/2], Iter [403/3125], train_loss:0.186334 Epoch [1/2], Iter [404/3125], train_loss:0.158865 Epoch [1/2], Iter [405/3125], train_loss:0.157437 Epoch [1/2], Iter [406/3125], train_loss:0.151499 Epoch [1/2], Iter [407/3125], train_loss:0.155167 Epoch [1/2], Iter [408/3125], train_loss:0.158371 Epoch [1/2], Iter [409/3125], train_loss:0.170319 Epoch [1/2], Iter [410/3125], train_loss:0.172921 Epoch [1/2], Iter [411/3125], train_loss:0.175661 Epoch [1/2], Iter [412/3125], train_loss:0.170778 Epoch [1/2], Iter [413/3125], train_loss:0.173227 Epoch [1/2], Iter [414/3125], train_loss:0.162998 Epoch [1/2], Iter [415/3125], train_loss:0.144094 Epoch [1/2], Iter [416/3125], train_loss:0.154685 Epoch [1/2], Iter [417/3125], train_loss:0.177953 Epoch [1/2], Iter [418/3125], train_loss:0.151602 Epoch [1/2], Iter [419/3125], train_loss:0.165047 Epoch [1/2], Iter [420/3125], train_loss:0.146441 Epoch [1/2], Iter [421/3125], train_loss:0.155056 Epoch [1/2], Iter [422/3125], train_loss:0.144277 Epoch [1/2], Iter [423/3125], train_loss:0.156635 Epoch [1/2], Iter [424/3125], train_loss:0.154019 Epoch [1/2], Iter [425/3125], train_loss:0.161336 Epoch [1/2], Iter [426/3125], train_loss:0.203085 Epoch [1/2], Iter [427/3125], train_loss:0.146707 Epoch [1/2], Iter [428/3125], train_loss:0.158310 Epoch [1/2], Iter [429/3125], train_loss:0.171015 Epoch [1/2], Iter [430/3125], train_loss:0.142477 Epoch [1/2], Iter [431/3125], train_loss:0.189910 Epoch [1/2], Iter [432/3125], train_loss:0.165581 Epoch [1/2], Iter [433/3125], train_loss:0.163553 Epoch [1/2], Iter [434/3125], train_loss:0.162628 Epoch [1/2], Iter [435/3125], train_loss:0.166308 Epoch [1/2], Iter [436/3125], train_loss:0.174060 Epoch [1/2], Iter [437/3125], train_loss:0.170486 Epoch [1/2], Iter [438/3125], train_loss:0.170334 Epoch [1/2], Iter [439/3125], train_loss:0.170027 Epoch [1/2], Iter [440/3125], train_loss:0.176327 Epoch [1/2], Iter [441/3125], train_loss:0.185929 Epoch [1/2], Iter [442/3125], train_loss:0.164644 Epoch [1/2], Iter [443/3125], train_loss:0.155429 Epoch [1/2], Iter [444/3125], train_loss:0.156190 Epoch [1/2], Iter [445/3125], train_loss:0.183739 Epoch [1/2], Iter [446/3125], train_loss:0.168132 Epoch [1/2], Iter [447/3125], train_loss:0.156675 Epoch [1/2], Iter [448/3125], train_loss:0.174648 Epoch [1/2], Iter [449/3125], train_loss:0.176605 Epoch [1/2], Iter [450/3125], train_loss:0.165744 Epoch [1/2], Iter [451/3125], train_loss:0.159427 Epoch [1/2], Iter [452/3125], train_loss:0.137918 Epoch [1/2], Iter [453/3125], train_loss:0.154664 Epoch [1/2], Iter [454/3125], train_loss:0.188046 Epoch [1/2], Iter [455/3125], train_loss:0.157990 Epoch [1/2], Iter [456/3125], train_loss:0.161434 Epoch [1/2], Iter [457/3125], train_loss:0.164751 Epoch [1/2], Iter [458/3125], train_loss:0.147707 Epoch [1/2], Iter [459/3125], train_loss:0.156135 Epoch [1/2], Iter [460/3125], train_loss:0.170298 Epoch [1/2], Iter [461/3125], train_loss:0.157925 Epoch [1/2], Iter [462/3125], train_loss:0.161613 Epoch [1/2], Iter [463/3125], train_loss:0.156034 Epoch [1/2], Iter [464/3125], train_loss:0.154685 Epoch [1/2], Iter [465/3125], train_loss:0.159974 Epoch [1/2], Iter [466/3125], train_loss:0.137804 Epoch [1/2], Iter [467/3125], train_loss:0.173479 Epoch [1/2], Iter [468/3125], train_loss:0.160113 Epoch [1/2], Iter [469/3125], train_loss:0.181849 Epoch [1/2], Iter [470/3125], train_loss:0.154617 Epoch [1/2], Iter [471/3125], train_loss:0.145756 Epoch [1/2], Iter [472/3125], train_loss:0.173865 Epoch [1/2], Iter [473/3125], train_loss:0.179762 Epoch [1/2], Iter [474/3125], train_loss:0.148816 Epoch [1/2], Iter [475/3125], train_loss:0.143284 Epoch [1/2], Iter [476/3125], train_loss:0.171798 Epoch [1/2], Iter [477/3125], train_loss:0.180198 Epoch [1/2], Iter [478/3125], train_loss:0.160204 Epoch [1/2], Iter [479/3125], train_loss:0.166848 Epoch [1/2], Iter [480/3125], train_loss:0.168912 Epoch [1/2], Iter [481/3125], train_loss:0.151769 Epoch [1/2], Iter [482/3125], train_loss:0.164199 Epoch [1/2], Iter [483/3125], train_loss:0.159082 Epoch [1/2], Iter [484/3125], train_loss:0.157923 Epoch [1/2], Iter [485/3125], train_loss:0.175519 Epoch [1/2], Iter [486/3125], train_loss:0.161383 Epoch [1/2], Iter [487/3125], train_loss:0.162508 Epoch [1/2], Iter [488/3125], train_loss:0.165235 Epoch [1/2], Iter [489/3125], train_loss:0.179577 Epoch [1/2], Iter [490/3125], train_loss:0.151752 Epoch [1/2], Iter [491/3125], train_loss:0.171913 Epoch [1/2], Iter [492/3125], train_loss:0.163084 Epoch [1/2], Iter [493/3125], train_loss:0.156714 Epoch [1/2], Iter [494/3125], train_loss:0.156022 Epoch [1/2], Iter [495/3125], train_loss:0.157305 Epoch [1/2], Iter [496/3125], train_loss:0.156836 Epoch [1/2], Iter [497/3125], train_loss:0.154605 Epoch [1/2], Iter [498/3125], train_loss:0.174036 Epoch [1/2], Iter [499/3125], train_loss:0.164733 Epoch [1/2], Iter [500/3125], train_loss:0.162918 Epoch [1/2], Iter [501/3125], train_loss:0.149830 Epoch [1/2], Iter [502/3125], train_loss:0.186489 Epoch [1/2], Iter [503/3125], train_loss:0.145313 Epoch [1/2], Iter [504/3125], train_loss:0.152114 Epoch [1/2], Iter [505/3125], train_loss:0.150460 Epoch [1/2], Iter [506/3125], train_loss:0.172033 Epoch [1/2], Iter [507/3125], train_loss:0.156441 Epoch [1/2], Iter [508/3125], train_loss:0.151387 Epoch [1/2], Iter [509/3125], train_loss:0.174799 Epoch [1/2], Iter [510/3125], train_loss:0.156212 Epoch [1/2], Iter [511/3125], train_loss:0.157743 Epoch [1/2], Iter [512/3125], train_loss:0.171979 Epoch [1/2], Iter [513/3125], train_loss:0.183507 Epoch [1/2], Iter [514/3125], train_loss:0.174797 Epoch [1/2], Iter [515/3125], train_loss:0.151998 Epoch [1/2], Iter [516/3125], train_loss:0.164528 Epoch [1/2], Iter [517/3125], train_loss:0.164061 Epoch [1/2], Iter [518/3125], train_loss:0.184687 Epoch [1/2], Iter [519/3125], train_loss:0.153723 Epoch [1/2], Iter [520/3125], train_loss:0.140085 Epoch [1/2], Iter [521/3125], train_loss:0.161860 Epoch [1/2], Iter [522/3125], train_loss:0.142582 Epoch [1/2], Iter [523/3125], train_loss:0.158409 Epoch [1/2], Iter [524/3125], train_loss:0.197436 Epoch [1/2], Iter [525/3125], train_loss:0.170067 Epoch [1/2], Iter [526/3125], train_loss:0.150738 Epoch [1/2], Iter [527/3125], train_loss:0.164096 Epoch [1/2], Iter [528/3125], train_loss:0.159754 Epoch [1/2], Iter [529/3125], train_loss:0.152052 Epoch [1/2], Iter [530/3125], train_loss:0.161230 Epoch [1/2], Iter [531/3125], train_loss:0.181889 Epoch [1/2], Iter [532/3125], train_loss:0.149528 Epoch [1/2], Iter [533/3125], train_loss:0.156530 Epoch [1/2], Iter [534/3125], train_loss:0.143401 Epoch [1/2], Iter [535/3125], train_loss:0.164431 Epoch [1/2], Iter [536/3125], train_loss:0.155525 Epoch [1/2], Iter [537/3125], train_loss:0.170614 Epoch [1/2], Iter [538/3125], train_loss:0.172353 Epoch [1/2], Iter [539/3125], train_loss:0.167426 Epoch [1/2], Iter [540/3125], train_loss:0.141499 Epoch [1/2], Iter [541/3125], train_loss:0.165216 Epoch [1/2], Iter [542/3125], train_loss:0.164144 Epoch [1/2], Iter [543/3125], train_loss:0.149974 Epoch [1/2], Iter [544/3125], train_loss:0.157108 Epoch [1/2], Iter [545/3125], train_loss:0.169725 Epoch [1/2], Iter [546/3125], train_loss:0.181695 Epoch [1/2], Iter [547/3125], train_loss:0.161326 Epoch [1/2], Iter [548/3125], train_loss:0.187204 Epoch [1/2], Iter [549/3125], train_loss:0.152687 Epoch [1/2], Iter [550/3125], train_loss:0.144457 Epoch [1/2], Iter [551/3125], train_loss:0.160662 Epoch [1/2], Iter [552/3125], train_loss:0.154854 Epoch [1/2], Iter [553/3125], train_loss:0.159735 Epoch [1/2], Iter [554/3125], train_loss:0.147193 Epoch [1/2], Iter [555/3125], train_loss:0.157361 Epoch [1/2], Iter [556/3125], train_loss:0.186600 Epoch [1/2], Iter [557/3125], train_loss:0.152398 Epoch [1/2], Iter [558/3125], train_loss:0.175364 Epoch [1/2], Iter [559/3125], train_loss:0.167578 Epoch [1/2], Iter [560/3125], train_loss:0.158512 Epoch [1/2], Iter [561/3125], train_loss:0.173613 Epoch [1/2], Iter [562/3125], train_loss:0.160966 Epoch [1/2], Iter [563/3125], train_loss:0.172676 Epoch [1/2], Iter [564/3125], train_loss:0.158586 Epoch [1/2], Iter [565/3125], train_loss:0.180590 Epoch [1/2], Iter [566/3125], train_loss:0.192027 Epoch [1/2], Iter [567/3125], train_loss:0.157700 Epoch [1/2], Iter [568/3125], train_loss:0.162584 Epoch [1/2], Iter [569/3125], train_loss:0.183801 Epoch [1/2], Iter [570/3125], train_loss:0.167326 Epoch [1/2], Iter [571/3125], train_loss:0.164745 Epoch [1/2], Iter [572/3125], train_loss:0.173292 Epoch [1/2], Iter [573/3125], train_loss:0.153456 Epoch [1/2], Iter [574/3125], train_loss:0.160368 Epoch [1/2], Iter [575/3125], train_loss:0.151965 Epoch [1/2], Iter [576/3125], train_loss:0.154746 Epoch [1/2], Iter [577/3125], train_loss:0.170880 Epoch [1/2], Iter [578/3125], train_loss:0.161438 Epoch [1/2], Iter [579/3125], train_loss:0.180224 Epoch [1/2], Iter [580/3125], train_loss:0.178791 Epoch [1/2], Iter [581/3125], train_loss:0.145772 Epoch [1/2], Iter [582/3125], train_loss:0.160606 Epoch [1/2], Iter [583/3125], train_loss:0.176088 Epoch [1/2], Iter [584/3125], train_loss:0.164863 Epoch [1/2], Iter [585/3125], train_loss:0.181251 Epoch [1/2], Iter [586/3125], train_loss:0.151516 Epoch [1/2], Iter [587/3125], train_loss:0.176537 Epoch [1/2], Iter [588/3125], train_loss:0.159592 Epoch [1/2], Iter [589/3125], train_loss:0.156307 Epoch [1/2], Iter [590/3125], train_loss:0.149772 Epoch [1/2], Iter [591/3125], train_loss:0.168533 Epoch [1/2], Iter [592/3125], train_loss:0.156289 Epoch [1/2], Iter [593/3125], train_loss:0.171735 Epoch [1/2], Iter [594/3125], train_loss:0.159490 Epoch [1/2], Iter [595/3125], train_loss:0.156549 Epoch [1/2], Iter [596/3125], train_loss:0.172349 Epoch [1/2], Iter [597/3125], train_loss:0.163044 Epoch [1/2], Iter [598/3125], train_loss:0.153242 Epoch [1/2], Iter [599/3125], train_loss:0.178958 Epoch [1/2], Iter [600/3125], train_loss:0.154345 Epoch [1/2], Iter [601/3125], train_loss:0.169848 Epoch [1/2], Iter [602/3125], train_loss:0.148637 Epoch [1/2], Iter [603/3125], train_loss:0.168987 Epoch [1/2], Iter [604/3125], train_loss:0.181611 Epoch [1/2], Iter [605/3125], train_loss:0.155290 Epoch [1/2], Iter [606/3125], train_loss:0.159152 Epoch [1/2], Iter [607/3125], train_loss:0.158287 Epoch [1/2], Iter [608/3125], train_loss:0.165115 Epoch [1/2], Iter [609/3125], train_loss:0.170284 Epoch [1/2], Iter [610/3125], train_loss:0.180462 Epoch [1/2], Iter [611/3125], train_loss:0.170581 Epoch [1/2], Iter [612/3125], train_loss:0.156333 Epoch [1/2], Iter [613/3125], train_loss:0.158744 Epoch [1/2], Iter [614/3125], train_loss:0.151848 Epoch [1/2], Iter [615/3125], train_loss:0.146951 Epoch [1/2], Iter [616/3125], train_loss:0.164550 Epoch [1/2], Iter [617/3125], train_loss:0.163717 Epoch [1/2], Iter [618/3125], train_loss:0.143995 Epoch [1/2], Iter [619/3125], train_loss:0.174511 Epoch [1/2], Iter [620/3125], train_loss:0.161177 Epoch [1/2], Iter [621/3125], train_loss:0.178878 Epoch [1/2], Iter [622/3125], train_loss:0.170491 Epoch [1/2], Iter [623/3125], train_loss:0.172630 Epoch [1/2], Iter [624/3125], train_loss:0.185353 Epoch [1/2], Iter [625/3125], train_loss:0.162375 Epoch [1/2], Iter [626/3125], train_loss:0.166046 Epoch [1/2], Iter [627/3125], train_loss:0.187614 Epoch [1/2], Iter [628/3125], train_loss:0.171770 Epoch [1/2], Iter [629/3125], train_loss:0.137850 Epoch [1/2], Iter [630/3125], train_loss:0.160116 Epoch [1/2], Iter [631/3125], train_loss:0.147122 Epoch [1/2], Iter [632/3125], train_loss:0.182244 Epoch [1/2], Iter [633/3125], train_loss:0.143999 Epoch [1/2], Iter [634/3125], train_loss:0.182345 Epoch [1/2], Iter [635/3125], train_loss:0.156759 Epoch [1/2], Iter [636/3125], train_loss:0.148806 Epoch [1/2], Iter [637/3125], train_loss:0.154144 Epoch [1/2], Iter [638/3125], train_loss:0.149763 Epoch [1/2], Iter [639/3125], train_loss:0.154106 Epoch [1/2], Iter [640/3125], train_loss:0.168608 Epoch [1/2], Iter [641/3125], train_loss:0.152673 Epoch [1/2], Iter [642/3125], train_loss:0.173711 Epoch [1/2], Iter [643/3125], train_loss:0.168833 Epoch [1/2], Iter [644/3125], train_loss:0.169643 Epoch [1/2], Iter [645/3125], train_loss:0.155612 Epoch [1/2], Iter [646/3125], train_loss:0.161894 Epoch [1/2], Iter [647/3125], train_loss:0.173931 Epoch [1/2], Iter [648/3125], train_loss:0.195914 Epoch [1/2], Iter [649/3125], train_loss:0.156067 Epoch [1/2], Iter [650/3125], train_loss:0.182341 Epoch [1/2], Iter [651/3125], train_loss:0.162687 Epoch [1/2], Iter [652/3125], train_loss:0.146367 Epoch [1/2], Iter [653/3125], train_loss:0.188726 Epoch [1/2], Iter [654/3125], train_loss:0.141544 Epoch [1/2], Iter [655/3125], train_loss:0.136832 Epoch [1/2], Iter [656/3125], train_loss:0.165494 Epoch [1/2], Iter [657/3125], train_loss:0.173287 Epoch [1/2], Iter [658/3125], train_loss:0.178410 Epoch [1/2], Iter [659/3125], train_loss:0.153449 Epoch [1/2], Iter [660/3125], train_loss:0.141325 Epoch [1/2], Iter [661/3125], train_loss:0.160569 Epoch [1/2], Iter [662/3125], train_loss:0.168849 Epoch [1/2], Iter [663/3125], train_loss:0.170920 Epoch [1/2], Iter [664/3125], train_loss:0.168509 Epoch [1/2], Iter [665/3125], train_loss:0.174167 Epoch [1/2], Iter [666/3125], train_loss:0.165306 Epoch [1/2], Iter [667/3125], train_loss:0.173296 Epoch [1/2], Iter [668/3125], train_loss:0.161691 Epoch [1/2], Iter [669/3125], train_loss:0.164216 Epoch [1/2], Iter [670/3125], train_loss:0.153614 Epoch [1/2], Iter [671/3125], train_loss:0.176628 Epoch [1/2], Iter [672/3125], train_loss:0.170113 Epoch [1/2], Iter [673/3125], train_loss:0.164020 Epoch [1/2], Iter [674/3125], train_loss:0.170262 Epoch [1/2], Iter [675/3125], train_loss:0.160252 Epoch [1/2], Iter [676/3125], train_loss:0.159595 Epoch [1/2], Iter [677/3125], train_loss:0.174482 Epoch [1/2], Iter [678/3125], train_loss:0.174136 Epoch [1/2], Iter [679/3125], train_loss:0.169684 Epoch [1/2], Iter [680/3125], train_loss:0.156718 Epoch [1/2], Iter [681/3125], train_loss:0.172149 Epoch [1/2], Iter [682/3125], train_loss:0.153396 Epoch [1/2], Iter [683/3125], train_loss:0.152438 Epoch [1/2], Iter [684/3125], train_loss:0.175598 Epoch [1/2], Iter [685/3125], train_loss:0.144732 Epoch [1/2], Iter [686/3125], train_loss:0.146447 Epoch [1/2], Iter [687/3125], train_loss:0.149588 Epoch [1/2], Iter [688/3125], train_loss:0.158347 Epoch [1/2], Iter [689/3125], train_loss:0.173472 Epoch [1/2], Iter [690/3125], train_loss:0.164299 Epoch [1/2], Iter [691/3125], train_loss:0.147107 Epoch [1/2], Iter [692/3125], train_loss:0.138865 Epoch [1/2], Iter [693/3125], train_loss:0.147721 Epoch [1/2], Iter [694/3125], train_loss:0.186929 Epoch [1/2], Iter [695/3125], train_loss:0.149825 Epoch [1/2], Iter [696/3125], train_loss:0.159169 Epoch [1/2], Iter [697/3125], train_loss:0.168211 Epoch [1/2], Iter [698/3125], train_loss:0.155869 Epoch [1/2], Iter [699/3125], train_loss:0.175861 Epoch [1/2], Iter [700/3125], train_loss:0.147055 Epoch [1/2], Iter [701/3125], train_loss:0.152602 Epoch [1/2], Iter [702/3125], train_loss:0.165213 Epoch [1/2], Iter [703/3125], train_loss:0.155550 Epoch [1/2], Iter [704/3125], train_loss:0.165959 Epoch [1/2], Iter [705/3125], train_loss:0.184858 Epoch [1/2], Iter [706/3125], train_loss:0.156636 Epoch [1/2], Iter [707/3125], train_loss:0.141014 Epoch [1/2], Iter [708/3125], train_loss:0.172110 Epoch [1/2], Iter [709/3125], train_loss:0.166598 Epoch [1/2], Iter [710/3125], train_loss:0.181486 Epoch [1/2], Iter [711/3125], train_loss:0.149520 Epoch [1/2], Iter [712/3125], train_loss:0.141277 Epoch [1/2], Iter [713/3125], train_loss:0.150582 Epoch [1/2], Iter [714/3125], train_loss:0.170460 Epoch [1/2], Iter [715/3125], train_loss:0.166523 Epoch [1/2], Iter [716/3125], train_loss:0.140562 Epoch [1/2], Iter [717/3125], train_loss:0.157862 Epoch [1/2], Iter [718/3125], train_loss:0.158880 Epoch [1/2], Iter [719/3125], train_loss:0.151162 Epoch [1/2], Iter [720/3125], train_loss:0.150862 Epoch [1/2], Iter [721/3125], train_loss:0.172271 Epoch [1/2], Iter [722/3125], train_loss:0.167076 Epoch [1/2], Iter [723/3125], train_loss:0.160416 Epoch [1/2], Iter [724/3125], train_loss:0.164712 Epoch [1/2], Iter [725/3125], train_loss:0.155195 Epoch [1/2], Iter [726/3125], train_loss:0.173203 Epoch [1/2], Iter [727/3125], train_loss:0.203542 Epoch [1/2], Iter [728/3125], train_loss:0.132789 Epoch [1/2], Iter [729/3125], train_loss:0.170022 Epoch [1/2], Iter [730/3125], train_loss:0.150648 Epoch [1/2], Iter [731/3125], train_loss:0.152137 Epoch [1/2], Iter [732/3125], train_loss:0.165179 Epoch [1/2], Iter [733/3125], train_loss:0.181513 Epoch [1/2], Iter [734/3125], train_loss:0.144627 Epoch [1/2], Iter [735/3125], train_loss:0.156241 Epoch [1/2], Iter [736/3125], train_loss:0.156647 Epoch [1/2], Iter [737/3125], train_loss:0.156439 Epoch [1/2], Iter [738/3125], train_loss:0.184127 Epoch [1/2], Iter [739/3125], train_loss:0.149900 Epoch [1/2], Iter [740/3125], train_loss:0.178831 Epoch [1/2], Iter [741/3125], train_loss:0.154100 Epoch [1/2], Iter [742/3125], train_loss:0.173619 Epoch [1/2], Iter [743/3125], train_loss:0.174960 Epoch [1/2], Iter [744/3125], train_loss:0.158306 Epoch [1/2], Iter [745/3125], train_loss:0.157812 Epoch [1/2], Iter [746/3125], train_loss:0.170903 Epoch [1/2], Iter [747/3125], train_loss:0.158708 Epoch [1/2], Iter [748/3125], train_loss:0.177305 Epoch [1/2], Iter [749/3125], train_loss:0.157574 Epoch [1/2], Iter [750/3125], train_loss:0.163793 Epoch [1/2], Iter [751/3125], train_loss:0.175222 Epoch [1/2], Iter [752/3125], train_loss:0.167615 Epoch [1/2], Iter [753/3125], train_loss:0.175142 Epoch [1/2], Iter [754/3125], train_loss:0.164994 Epoch [1/2], Iter [755/3125], train_loss:0.173740 Epoch [1/2], Iter [756/3125], train_loss:0.184293 Epoch [1/2], Iter [757/3125], train_loss:0.174505 Epoch [1/2], Iter [758/3125], train_loss:0.151717 Epoch [1/2], Iter [759/3125], train_loss:0.149027 Epoch [1/2], Iter [760/3125], train_loss:0.181634 Epoch [1/2], Iter [761/3125], train_loss:0.157314 Epoch [1/2], Iter [762/3125], train_loss:0.137242 Epoch [1/2], Iter [763/3125], train_loss:0.168438 Epoch [1/2], Iter [764/3125], train_loss:0.141019 Epoch [1/2], Iter [765/3125], train_loss:0.154936 Epoch [1/2], Iter [766/3125], train_loss:0.155263 Epoch [1/2], Iter [767/3125], train_loss:0.156193 Epoch [1/2], Iter [768/3125], train_loss:0.154753 Epoch [1/2], Iter [769/3125], train_loss:0.152388 Epoch [1/2], Iter [770/3125], train_loss:0.154891 Epoch [1/2], Iter [771/3125], train_loss:0.150887 Epoch [1/2], Iter [772/3125], train_loss:0.170387 Epoch [1/2], Iter [773/3125], train_loss:0.142415 Epoch [1/2], Iter [774/3125], train_loss:0.157543 Epoch [1/2], Iter [775/3125], train_loss:0.161519 Epoch [1/2], Iter [776/3125], train_loss:0.153466 Epoch [1/2], Iter [777/3125], train_loss:0.164538 Epoch [1/2], Iter [778/3125], train_loss:0.167005 Epoch [1/2], Iter [779/3125], train_loss:0.164542 Epoch [1/2], Iter [780/3125], train_loss:0.136895 Epoch [1/2], Iter [781/3125], train_loss:0.143366 Epoch [1/2], Iter [782/3125], train_loss:0.159515 Epoch [1/2], Iter [783/3125], train_loss:0.159623 Epoch [1/2], Iter [784/3125], train_loss:0.175021 Epoch [1/2], Iter [785/3125], train_loss:0.188726 Epoch [1/2], Iter [786/3125], train_loss:0.167352 Epoch [1/2], Iter [787/3125], train_loss:0.159414 Epoch [1/2], Iter [788/3125], train_loss:0.143568 Epoch [1/2], Iter [789/3125], train_loss:0.157005 Epoch [1/2], Iter [790/3125], train_loss:0.150693 Epoch [1/2], Iter [791/3125], train_loss:0.142032 Epoch [1/2], Iter [792/3125], train_loss:0.158453 Epoch [1/2], Iter [793/3125], train_loss:0.171967 Epoch [1/2], Iter [794/3125], train_loss:0.154673 Epoch [1/2], Iter [795/3125], train_loss:0.161099 Epoch [1/2], Iter [796/3125], train_loss:0.149141 Epoch [1/2], Iter [797/3125], train_loss:0.172768 Epoch [1/2], Iter [798/3125], train_loss:0.136935 Epoch [1/2], Iter [799/3125], train_loss:0.150901 Epoch [1/2], Iter [800/3125], train_loss:0.177802 Epoch [1/2], Iter [801/3125], train_loss:0.151622 Epoch [1/2], Iter [802/3125], train_loss:0.175425 Epoch [1/2], Iter [803/3125], train_loss:0.158219 Epoch [1/2], Iter [804/3125], train_loss:0.160822 Epoch [1/2], Iter [805/3125], train_loss:0.171360 Epoch [1/2], Iter [806/3125], train_loss:0.173840 Epoch [1/2], Iter [807/3125], train_loss:0.170521 Epoch [1/2], Iter [808/3125], train_loss:0.155042 Epoch [1/2], Iter [809/3125], train_loss:0.201056 Epoch [1/2], Iter [810/3125], train_loss:0.167513 Epoch [1/2], Iter [811/3125], train_loss:0.159135 Epoch [1/2], Iter [812/3125], train_loss:0.161629 Epoch [1/2], Iter [813/3125], train_loss:0.172826 Epoch [1/2], Iter [814/3125], train_loss:0.148274 Epoch [1/2], Iter [815/3125], train_loss:0.183451 Epoch [1/2], Iter [816/3125], train_loss:0.164296 Epoch [1/2], Iter [817/3125], train_loss:0.177334 Epoch [1/2], Iter [818/3125], train_loss:0.154336 Epoch [1/2], Iter [819/3125], train_loss:0.170955 Epoch [1/2], Iter [820/3125], train_loss:0.168194 Epoch [1/2], Iter [821/3125], train_loss:0.165284 Epoch [1/2], Iter [822/3125], train_loss:0.153692 Epoch [1/2], Iter [823/3125], train_loss:0.164452 Epoch [1/2], Iter [824/3125], train_loss:0.160168 Epoch [1/2], Iter [825/3125], train_loss:0.143389 Epoch [1/2], Iter [826/3125], train_loss:0.125640 Epoch [1/2], Iter [827/3125], train_loss:0.154325 Epoch [1/2], Iter [828/3125], train_loss:0.170027 Epoch [1/2], Iter [829/3125], train_loss:0.163227 Epoch [1/2], Iter [830/3125], train_loss:0.180084 Epoch [1/2], Iter [831/3125], train_loss:0.153447 Epoch [1/2], Iter [832/3125], train_loss:0.174136 Epoch [1/2], Iter [833/3125], train_loss:0.166332 Epoch [1/2], Iter [834/3125], train_loss:0.157354 Epoch [1/2], Iter [835/3125], train_loss:0.120264 Epoch [1/2], Iter [836/3125], train_loss:0.148319 Epoch [1/2], Iter [837/3125], train_loss:0.156353 Epoch [1/2], Iter [838/3125], train_loss:0.153210 Epoch [1/2], Iter [839/3125], train_loss:0.169396 Epoch [1/2], Iter [840/3125], train_loss:0.163863 Epoch [1/2], Iter [841/3125], train_loss:0.156365 Epoch [1/2], Iter [842/3125], train_loss:0.166741 Epoch [1/2], Iter [843/3125], train_loss:0.153688 Epoch [1/2], Iter [844/3125], train_loss:0.173625 Epoch [1/2], Iter [845/3125], train_loss:0.167021 Epoch [1/2], Iter [846/3125], train_loss:0.149013 Epoch [1/2], Iter [847/3125], train_loss:0.165667 Epoch [1/2], Iter [848/3125], train_loss:0.153663 Epoch [1/2], Iter [849/3125], train_loss:0.179361 Epoch [1/2], Iter [850/3125], train_loss:0.175290 Epoch [1/2], Iter [851/3125], train_loss:0.168034 Epoch [1/2], Iter [852/3125], train_loss:0.161994 Epoch [1/2], Iter [853/3125], train_loss:0.171823 Epoch [1/2], Iter [854/3125], train_loss:0.147653 Epoch [1/2], Iter [855/3125], train_loss:0.159570 Epoch [1/2], Iter [856/3125], train_loss:0.157177 Epoch [1/2], Iter [857/3125], train_loss:0.155356 Epoch [1/2], Iter [858/3125], train_loss:0.159401 Epoch [1/2], Iter [859/3125], train_loss:0.179499 Epoch [1/2], Iter [860/3125], train_loss:0.153504 Epoch [1/2], Iter [861/3125], train_loss:0.174314 Epoch [1/2], Iter [862/3125], train_loss:0.145309 Epoch [1/2], Iter [863/3125], train_loss:0.170880 Epoch [1/2], Iter [864/3125], train_loss:0.171654 Epoch [1/2], Iter [865/3125], train_loss:0.125795 Epoch [1/2], Iter [866/3125], train_loss:0.161326 Epoch [1/2], Iter [867/3125], train_loss:0.170622 Epoch [1/2], Iter [868/3125], train_loss:0.164444 Epoch [1/2], Iter [869/3125], train_loss:0.138547 Epoch [1/2], Iter [870/3125], train_loss:0.158241 Epoch [1/2], Iter [871/3125], train_loss:0.158963 Epoch [1/2], Iter [872/3125], train_loss:0.198872 Epoch [1/2], Iter [873/3125], train_loss:0.171953 Epoch [1/2], Iter [874/3125], train_loss:0.135011 Epoch [1/2], Iter [875/3125], train_loss:0.187574 Epoch [1/2], Iter [876/3125], train_loss:0.183588 Epoch [1/2], Iter [877/3125], train_loss:0.172207 Epoch [1/2], Iter [878/3125], train_loss:0.163572 Epoch [1/2], Iter [879/3125], train_loss:0.177148 Epoch [1/2], Iter [880/3125], train_loss:0.156533 Epoch [1/2], Iter [881/3125], train_loss:0.164676 Epoch [1/2], Iter [882/3125], train_loss:0.155344 Epoch [1/2], Iter [883/3125], train_loss:0.179421 Epoch [1/2], Iter [884/3125], train_loss:0.152775 Epoch [1/2], Iter [885/3125], train_loss:0.183656 Epoch [1/2], Iter [886/3125], train_loss:0.178685 Epoch [1/2], Iter [887/3125], train_loss:0.174813 Epoch [1/2], Iter [888/3125], train_loss:0.164418 Epoch [1/2], Iter [889/3125], train_loss:0.151287 Epoch [1/2], Iter [890/3125], train_loss:0.159186 Epoch [1/2], Iter [891/3125], train_loss:0.176169 Epoch [1/2], Iter [892/3125], train_loss:0.153548 Epoch [1/2], Iter [893/3125], train_loss:0.163016 Epoch [1/2], Iter [894/3125], train_loss:0.152066 Epoch [1/2], Iter [895/3125], train_loss:0.161777 Epoch [1/2], Iter [896/3125], train_loss:0.147675 Epoch [1/2], Iter [897/3125], train_loss:0.176385 Epoch [1/2], Iter [898/3125], train_loss:0.163108 Epoch [1/2], Iter [899/3125], train_loss:0.157772 Epoch [1/2], Iter [900/3125], train_loss:0.176365 Epoch [1/2], Iter [901/3125], train_loss:0.163414 Epoch [1/2], Iter [902/3125], train_loss:0.152687 Epoch [1/2], Iter [903/3125], train_loss:0.149312 Epoch [1/2], Iter [904/3125], train_loss:0.145944 Epoch [1/2], Iter [905/3125], train_loss:0.183935 Epoch [1/2], Iter [906/3125], train_loss:0.141416 Epoch [1/2], Iter [907/3125], train_loss:0.148432 Epoch [1/2], Iter [908/3125], train_loss:0.161458 Epoch [1/2], Iter [909/3125], train_loss:0.159551 Epoch [1/2], Iter [910/3125], train_loss:0.147279 Epoch [1/2], Iter [911/3125], train_loss:0.149100 Epoch [1/2], Iter [912/3125], train_loss:0.147561 Epoch [1/2], Iter [913/3125], train_loss:0.153887 Epoch [1/2], Iter [914/3125], train_loss:0.155617 Epoch [1/2], Iter [915/3125], train_loss:0.138967 Epoch [1/2], Iter [916/3125], train_loss:0.187655 Epoch [1/2], Iter [917/3125], train_loss:0.189089 Epoch [1/2], Iter [918/3125], train_loss:0.185985 Epoch [1/2], Iter [919/3125], train_loss:0.159035 Epoch [1/2], Iter [920/3125], train_loss:0.158106 Epoch [1/2], Iter [921/3125], train_loss:0.160929 Epoch [1/2], Iter [922/3125], train_loss:0.165616 Epoch [1/2], Iter [923/3125], train_loss:0.159126 Epoch [1/2], Iter [924/3125], train_loss:0.166481 Epoch [1/2], Iter [925/3125], train_loss:0.166022 Epoch [1/2], Iter [926/3125], train_loss:0.143194 Epoch [1/2], Iter [927/3125], train_loss:0.166617 Epoch [1/2], Iter [928/3125], train_loss:0.165519 Epoch [1/2], Iter [929/3125], train_loss:0.149431 Epoch [1/2], Iter [930/3125], train_loss:0.158727 Epoch [1/2], Iter [931/3125], train_loss:0.143095 Epoch [1/2], Iter [932/3125], train_loss:0.153236 Epoch [1/2], Iter [933/3125], train_loss:0.148599 Epoch [1/2], Iter [934/3125], train_loss:0.159922 Epoch [1/2], Iter [935/3125], train_loss:0.168778 Epoch [1/2], Iter [936/3125], train_loss:0.149560 Epoch [1/2], Iter [937/3125], train_loss:0.160552 Epoch [1/2], Iter [938/3125], train_loss:0.151009 Epoch [1/2], Iter [939/3125], train_loss:0.171371 Epoch [1/2], Iter [940/3125], train_loss:0.156552 Epoch [1/2], Iter [941/3125], train_loss:0.153480 Epoch [1/2], Iter [942/3125], train_loss:0.144583 Epoch [1/2], Iter [943/3125], train_loss:0.156411 Epoch [1/2], Iter [944/3125], train_loss:0.153091 Epoch [1/2], Iter [945/3125], train_loss:0.158801 Epoch [1/2], Iter [946/3125], train_loss:0.139097 Epoch [1/2], Iter [947/3125], train_loss:0.171316 Epoch [1/2], Iter [948/3125], train_loss:0.179716 Epoch [1/2], Iter [949/3125], train_loss:0.150855 Epoch [1/2], Iter [950/3125], train_loss:0.159264 Epoch [1/2], Iter [951/3125], train_loss:0.178506 Epoch [1/2], Iter [952/3125], train_loss:0.165264 Epoch [1/2], Iter [953/3125], train_loss:0.163345 Epoch [1/2], Iter [954/3125], train_loss:0.163006 Epoch [1/2], Iter [955/3125], train_loss:0.180055 Epoch [1/2], Iter [956/3125], train_loss:0.152678 Epoch [1/2], Iter [957/3125], train_loss:0.149015 Epoch [1/2], Iter [958/3125], train_loss:0.167205 Epoch [1/2], Iter [959/3125], train_loss:0.159652 Epoch [1/2], Iter [960/3125], train_loss:0.172454 Epoch [1/2], Iter [961/3125], train_loss:0.152453 Epoch [1/2], Iter [962/3125], train_loss:0.159779 Epoch [1/2], Iter [963/3125], train_loss:0.157254 Epoch [1/2], Iter [964/3125], train_loss:0.171195 Epoch [1/2], Iter [965/3125], train_loss:0.129605 Epoch [1/2], Iter [966/3125], train_loss:0.171718 Epoch [1/2], Iter [967/3125], train_loss:0.135528 Epoch [1/2], Iter [968/3125], train_loss:0.171078 Epoch [1/2], Iter [969/3125], train_loss:0.177459 Epoch [1/2], Iter [970/3125], train_loss:0.155430 Epoch [1/2], Iter [971/3125], train_loss:0.162782 Epoch [1/2], Iter [972/3125], train_loss:0.179943 Epoch [1/2], Iter [973/3125], train_loss:0.159568 Epoch [1/2], Iter [974/3125], train_loss:0.145395 Epoch [1/2], Iter [975/3125], train_loss:0.162883 Epoch [1/2], Iter [976/3125], train_loss:0.152242 Epoch [1/2], Iter [977/3125], train_loss:0.178401 Epoch [1/2], Iter [978/3125], train_loss:0.149824 Epoch [1/2], Iter [979/3125], train_loss:0.164016 Epoch [1/2], Iter [980/3125], train_loss:0.173642 Epoch [1/2], Iter [981/3125], train_loss:0.184837 Epoch [1/2], Iter [982/3125], train_loss:0.157643 Epoch [1/2], Iter [983/3125], train_loss:0.170323 Epoch [1/2], Iter [984/3125], train_loss:0.145053 Epoch [1/2], Iter [985/3125], train_loss:0.177159 Epoch [1/2], Iter [986/3125], train_loss:0.170435 Epoch [1/2], Iter [987/3125], train_loss:0.140393 Epoch [1/2], Iter [988/3125], train_loss:0.170333 Epoch [1/2], Iter [989/3125], train_loss:0.154973 Epoch [1/2], Iter [990/3125], train_loss:0.168512 Epoch [1/2], Iter [991/3125], train_loss:0.171748 Epoch [1/2], Iter [992/3125], train_loss:0.191529 Epoch [1/2], Iter [993/3125], train_loss:0.164100 Epoch [1/2], Iter [994/3125], train_loss:0.169794 Epoch [1/2], Iter [995/3125], train_loss:0.161297 Epoch [1/2], Iter [996/3125], train_loss:0.154997 Epoch [1/2], Iter [997/3125], train_loss:0.174858 Epoch [1/2], Iter [998/3125], train_loss:0.131562 Epoch [1/2], Iter [999/3125], train_loss:0.170373 Epoch [1/2], Iter [1000/3125], train_loss:0.174953 Epoch [1/2], Iter [1001/3125], train_loss:0.171201 Epoch [1/2], Iter [1002/3125], train_loss:0.157745 Epoch [1/2], Iter [1003/3125], train_loss:0.165321 Epoch [1/2], Iter [1004/3125], train_loss:0.166191 Epoch [1/2], Iter [1005/3125], train_loss:0.161590 Epoch [1/2], Iter [1006/3125], train_loss:0.155218 Epoch [1/2], Iter [1007/3125], train_loss:0.161306 Epoch [1/2], Iter [1008/3125], train_loss:0.160885 Epoch [1/2], Iter [1009/3125], train_loss:0.150420 Epoch [1/2], Iter [1010/3125], train_loss:0.180716 Epoch [1/2], Iter [1011/3125], train_loss:0.170864 Epoch [1/2], Iter [1012/3125], train_loss:0.155604 Epoch [1/2], Iter [1013/3125], train_loss:0.138882 Epoch [1/2], Iter [1014/3125], train_loss:0.163906 Epoch [1/2], Iter [1015/3125], train_loss:0.157286 Epoch [1/2], Iter [1016/3125], train_loss:0.176689 Epoch [1/2], Iter [1017/3125], train_loss:0.164429 Epoch [1/2], Iter [1018/3125], train_loss:0.151421 Epoch [1/2], Iter [1019/3125], train_loss:0.173269 Epoch [1/2], Iter [1020/3125], train_loss:0.159520 Epoch [1/2], Iter [1021/3125], train_loss:0.136108 Epoch [1/2], Iter [1022/3125], train_loss:0.168635 Epoch [1/2], Iter [1023/3125], train_loss:0.172606 Epoch [1/2], Iter [1024/3125], train_loss:0.169962 Epoch [1/2], Iter [1025/3125], train_loss:0.171667 Epoch [1/2], Iter [1026/3125], train_loss:0.186360 Epoch [1/2], Iter [1027/3125], train_loss:0.154808 Epoch [1/2], Iter [1028/3125], train_loss:0.162741 Epoch [1/2], Iter [1029/3125], train_loss:0.168956 Epoch [1/2], Iter [1030/3125], train_loss:0.167106 Epoch [1/2], Iter [1031/3125], train_loss:0.150996 Epoch [1/2], Iter [1032/3125], train_loss:0.148269 Epoch [1/2], Iter [1033/3125], train_loss:0.159704 Epoch [1/2], Iter [1034/3125], train_loss:0.169828 Epoch [1/2], Iter [1035/3125], train_loss:0.170876 Epoch [1/2], Iter [1036/3125], train_loss:0.152638 Epoch [1/2], Iter [1037/3125], train_loss:0.156386 Epoch [1/2], Iter [1038/3125], train_loss:0.158583 Epoch [1/2], Iter [1039/3125], train_loss:0.131727 Epoch [1/2], Iter [1040/3125], train_loss:0.159804 Epoch [1/2], Iter [1041/3125], train_loss:0.150478 Epoch [1/2], Iter [1042/3125], train_loss:0.172487 Epoch [1/2], Iter [1043/3125], train_loss:0.172604 Epoch [1/2], Iter [1044/3125], train_loss:0.176825 Epoch [1/2], Iter [1045/3125], train_loss:0.155156 Epoch [1/2], Iter [1046/3125], train_loss:0.159919 Epoch [1/2], Iter [1047/3125], train_loss:0.158133 Epoch [1/2], Iter [1048/3125], train_loss:0.171692 Epoch [1/2], Iter [1049/3125], train_loss:0.148961 Epoch [1/2], Iter [1050/3125], train_loss:0.145803 Epoch [1/2], Iter [1051/3125], train_loss:0.166840 Epoch [1/2], Iter [1052/3125], train_loss:0.144305 Epoch [1/2], Iter [1053/3125], train_loss:0.148482 Epoch [1/2], Iter [1054/3125], train_loss:0.159671 Epoch [1/2], Iter [1055/3125], train_loss:0.160208 Epoch [1/2], Iter [1056/3125], train_loss:0.167555 Epoch [1/2], Iter [1057/3125], train_loss:0.161161 Epoch [1/2], Iter [1058/3125], train_loss:0.149388 Epoch [1/2], Iter [1059/3125], train_loss:0.181070 Epoch [1/2], Iter [1060/3125], train_loss:0.171973 Epoch [1/2], Iter [1061/3125], train_loss:0.180709 Epoch [1/2], Iter [1062/3125], train_loss:0.153507 Epoch [1/2], Iter [1063/3125], train_loss:0.145896 Epoch [1/2], Iter [1064/3125], train_loss:0.166199 Epoch [1/2], Iter [1065/3125], train_loss:0.166107 Epoch [1/2], Iter [1066/3125], train_loss:0.155540 Epoch [1/2], Iter [1067/3125], train_loss:0.154284 Epoch [1/2], Iter [1068/3125], train_loss:0.187118 Epoch [1/2], Iter [1069/3125], train_loss:0.159748 Epoch [1/2], Iter [1070/3125], train_loss:0.164387 Epoch [1/2], Iter [1071/3125], train_loss:0.142183 Epoch [1/2], Iter [1072/3125], train_loss:0.138724 Epoch [1/2], Iter [1073/3125], train_loss:0.146154 Epoch [1/2], Iter [1074/3125], train_loss:0.162630 Epoch [1/2], Iter [1075/3125], train_loss:0.185121 Epoch [1/2], Iter [1076/3125], train_loss:0.160997 Epoch [1/2], Iter [1077/3125], train_loss:0.179530 Epoch [1/2], Iter [1078/3125], train_loss:0.153985 Epoch [1/2], Iter [1079/3125], train_loss:0.147778 Epoch [1/2], Iter [1080/3125], train_loss:0.149064 Epoch [1/2], Iter [1081/3125], train_loss:0.151860 Epoch [1/2], Iter [1082/3125], train_loss:0.161012 Epoch [1/2], Iter [1083/3125], train_loss:0.197195 Epoch [1/2], Iter [1084/3125], train_loss:0.157916 Epoch [1/2], Iter [1085/3125], train_loss:0.162013 Epoch [1/2], Iter [1086/3125], train_loss:0.160678 Epoch [1/2], Iter [1087/3125], train_loss:0.168688 Epoch [1/2], Iter [1088/3125], train_loss:0.139971 Epoch [1/2], Iter [1089/3125], train_loss:0.184158 Epoch [1/2], Iter [1090/3125], train_loss:0.184676 Epoch [1/2], Iter [1091/3125], train_loss:0.150608 Epoch [1/2], Iter [1092/3125], train_loss:0.151490 Epoch [1/2], Iter [1093/3125], train_loss:0.176660 Epoch [1/2], Iter [1094/3125], train_loss:0.153394 Epoch [1/2], Iter [1095/3125], train_loss:0.163054 Epoch [1/2], Iter [1096/3125], train_loss:0.144125 Epoch [1/2], Iter [1097/3125], train_loss:0.167320 Epoch [1/2], Iter [1098/3125], train_loss:0.158095 Epoch [1/2], Iter [1099/3125], train_loss:0.150448 Epoch [1/2], Iter [1100/3125], train_loss:0.167379 Epoch [1/2], Iter [1101/3125], train_loss:0.151696 Epoch [1/2], Iter [1102/3125], train_loss:0.174957 Epoch [1/2], Iter [1103/3125], train_loss:0.159035 Epoch [1/2], Iter [1104/3125], train_loss:0.163059 Epoch [1/2], Iter [1105/3125], train_loss:0.170439 Epoch [1/2], Iter [1106/3125], train_loss:0.163471 Epoch [1/2], Iter [1107/3125], train_loss:0.180205 Epoch [1/2], Iter [1108/3125], train_loss:0.148346 Epoch [1/2], Iter [1109/3125], train_loss:0.170757 Epoch [1/2], Iter [1110/3125], train_loss:0.155306 Epoch [1/2], Iter [1111/3125], train_loss:0.173857 Epoch [1/2], Iter [1112/3125], train_loss:0.149888 Epoch [1/2], Iter [1113/3125], train_loss:0.164874 Epoch [1/2], Iter [1114/3125], train_loss:0.165291 Epoch [1/2], Iter [1115/3125], train_loss:0.143524 Epoch [1/2], Iter [1116/3125], train_loss:0.151315 Epoch [1/2], Iter [1117/3125], train_loss:0.169421 Epoch [1/2], Iter [1118/3125], train_loss:0.166722 Epoch [1/2], Iter [1119/3125], train_loss:0.167637 Epoch [1/2], Iter [1120/3125], train_loss:0.155744 Epoch [1/2], Iter [1121/3125], train_loss:0.162011 Epoch [1/2], Iter [1122/3125], train_loss:0.160295 Epoch [1/2], Iter [1123/3125], train_loss:0.154625 Epoch [1/2], Iter [1124/3125], train_loss:0.151765 Epoch [1/2], Iter [1125/3125], train_loss:0.170621 Epoch [1/2], Iter [1126/3125], train_loss:0.155552 Epoch [1/2], Iter [1127/3125], train_loss:0.173134 Epoch [1/2], Iter [1128/3125], train_loss:0.153150 Epoch [1/2], Iter [1129/3125], train_loss:0.145719 Epoch [1/2], Iter [1130/3125], train_loss:0.187136 Epoch [1/2], Iter [1131/3125], train_loss:0.169417 Epoch [1/2], Iter [1132/3125], train_loss:0.178974 Epoch [1/2], Iter [1133/3125], train_loss:0.149931 Epoch [1/2], Iter [1134/3125], train_loss:0.155474 Epoch [1/2], Iter [1135/3125], train_loss:0.161715 Epoch [1/2], Iter [1136/3125], train_loss:0.165408 Epoch [1/2], Iter [1137/3125], train_loss:0.170022 Epoch [1/2], Iter [1138/3125], train_loss:0.147393 Epoch [1/2], Iter [1139/3125], train_loss:0.175394 Epoch [1/2], Iter [1140/3125], train_loss:0.157841 Epoch [1/2], Iter [1141/3125], train_loss:0.164718 Epoch [1/2], Iter [1142/3125], train_loss:0.154701 Epoch [1/2], Iter [1143/3125], train_loss:0.168679 Epoch [1/2], Iter [1144/3125], train_loss:0.181446 Epoch [1/2], Iter [1145/3125], train_loss:0.148103 Epoch [1/2], Iter [1146/3125], train_loss:0.151220 Epoch [1/2], Iter [1147/3125], train_loss:0.186586 Epoch [1/2], Iter [1148/3125], train_loss:0.174347 Epoch [1/2], Iter [1149/3125], train_loss:0.170932 Epoch [1/2], Iter [1150/3125], train_loss:0.165207 Epoch [1/2], Iter [1151/3125], train_loss:0.158725 Epoch [1/2], Iter [1152/3125], train_loss:0.155164 Epoch [1/2], Iter [1153/3125], train_loss:0.171893 Epoch [1/2], Iter [1154/3125], train_loss:0.162601 Epoch [1/2], Iter [1155/3125], train_loss:0.160125 Epoch [1/2], Iter [1156/3125], train_loss:0.181936 Epoch [1/2], Iter [1157/3125], train_loss:0.172337 Epoch [1/2], Iter [1158/3125], train_loss:0.147319 Epoch [1/2], Iter [1159/3125], train_loss:0.183120 Epoch [1/2], Iter [1160/3125], train_loss:0.168000 Epoch [1/2], Iter [1161/3125], train_loss:0.163454 Epoch [1/2], Iter [1162/3125], train_loss:0.158614 Epoch [1/2], Iter [1163/3125], train_loss:0.170988 Epoch [1/2], Iter [1164/3125], train_loss:0.162558 Epoch [1/2], Iter [1165/3125], train_loss:0.164345 Epoch [1/2], Iter [1166/3125], train_loss:0.151922 Epoch [1/2], Iter [1167/3125], train_loss:0.182245 Epoch [1/2], Iter [1168/3125], train_loss:0.162371 Epoch [1/2], Iter [1169/3125], train_loss:0.155639 Epoch [1/2], Iter [1170/3125], train_loss:0.157078 Epoch [1/2], Iter [1171/3125], train_loss:0.168648 Epoch [1/2], Iter [1172/3125], train_loss:0.160894 Epoch [1/2], Iter [1173/3125], train_loss:0.172699 Epoch [1/2], Iter [1174/3125], train_loss:0.186120 Epoch [1/2], Iter [1175/3125], train_loss:0.163291 Epoch [1/2], Iter [1176/3125], train_loss:0.160210 Epoch [1/2], Iter [1177/3125], train_loss:0.157460 Epoch [1/2], Iter [1178/3125], train_loss:0.169464 Epoch [1/2], Iter [1179/3125], train_loss:0.155117 Epoch [1/2], Iter [1180/3125], train_loss:0.175044 Epoch [1/2], Iter [1181/3125], train_loss:0.171335 Epoch [1/2], Iter [1182/3125], train_loss:0.156485 Epoch [1/2], Iter [1183/3125], train_loss:0.166067 Epoch [1/2], Iter [1184/3125], train_loss:0.161618 Epoch [1/2], Iter [1185/3125], train_loss:0.158961 Epoch [1/2], Iter [1186/3125], train_loss:0.158767 Epoch [1/2], Iter [1187/3125], train_loss:0.158581 Epoch [1/2], Iter [1188/3125], train_loss:0.150107 Epoch [1/2], Iter [1189/3125], train_loss:0.151393 Epoch [1/2], Iter [1190/3125], train_loss:0.157746 Epoch [1/2], Iter [1191/3125], train_loss:0.162504 Epoch [1/2], Iter [1192/3125], train_loss:0.162217 Epoch [1/2], Iter [1193/3125], train_loss:0.184125 Epoch [1/2], Iter [1194/3125], train_loss:0.155755 Epoch [1/2], Iter [1195/3125], train_loss:0.163561 Epoch [1/2], Iter [1196/3125], train_loss:0.169904 Epoch [1/2], Iter [1197/3125], train_loss:0.163287 Epoch [1/2], Iter [1198/3125], train_loss:0.162994 Epoch [1/2], Iter [1199/3125], train_loss:0.179970 Epoch [1/2], Iter [1200/3125], train_loss:0.175508 Epoch [1/2], Iter [1201/3125], train_loss:0.171165 Epoch [1/2], Iter [1202/3125], train_loss:0.157112 Epoch [1/2], Iter [1203/3125], train_loss:0.162217 Epoch [1/2], Iter [1204/3125], train_loss:0.168821 Epoch [1/2], Iter [1205/3125], train_loss:0.197255 Epoch [1/2], Iter [1206/3125], train_loss:0.160194 Epoch [1/2], Iter [1207/3125], train_loss:0.151903 Epoch [1/2], Iter [1208/3125], train_loss:0.149178 Epoch [1/2], Iter [1209/3125], train_loss:0.146489 Epoch [1/2], Iter [1210/3125], train_loss:0.154193 Epoch [1/2], Iter [1211/3125], train_loss:0.157028 Epoch [1/2], Iter [1212/3125], train_loss:0.162961 Epoch [1/2], Iter [1213/3125], train_loss:0.176358 Epoch [1/2], Iter [1214/3125], train_loss:0.170513 Epoch [1/2], Iter [1215/3125], train_loss:0.166415 Epoch [1/2], Iter [1216/3125], train_loss:0.150504 Epoch [1/2], Iter [1217/3125], train_loss:0.169194 Epoch [1/2], Iter [1218/3125], train_loss:0.173286 Epoch [1/2], Iter [1219/3125], train_loss:0.170073 Epoch [1/2], Iter [1220/3125], train_loss:0.157464 Epoch [1/2], Iter [1221/3125], train_loss:0.153022 Epoch [1/2], Iter [1222/3125], train_loss:0.164855 Epoch [1/2], Iter [1223/3125], train_loss:0.155083 Epoch [1/2], Iter [1224/3125], train_loss:0.165551 Epoch [1/2], Iter [1225/3125], train_loss:0.185195 Epoch [1/2], Iter [1226/3125], train_loss:0.177821 Epoch [1/2], Iter [1227/3125], train_loss:0.154561 Epoch [1/2], Iter [1228/3125], train_loss:0.159085 Epoch [1/2], Iter [1229/3125], train_loss:0.171906 Epoch [1/2], Iter [1230/3125], train_loss:0.160470 Epoch [1/2], Iter [1231/3125], train_loss:0.151237 Epoch [1/2], Iter [1232/3125], train_loss:0.135055 Epoch [1/2], Iter [1233/3125], train_loss:0.140605 Epoch [1/2], Iter [1234/3125], train_loss:0.183646 Epoch [1/2], Iter [1235/3125], train_loss:0.158728 Epoch [1/2], Iter [1236/3125], train_loss:0.163355 Epoch [1/2], Iter [1237/3125], train_loss:0.148448 Epoch [1/2], Iter [1238/3125], train_loss:0.165396 Epoch [1/2], Iter [1239/3125], train_loss:0.181543 Epoch [1/2], Iter [1240/3125], train_loss:0.166355 Epoch [1/2], Iter [1241/3125], train_loss:0.158869 Epoch [1/2], Iter [1242/3125], train_loss:0.153979 Epoch [1/2], Iter [1243/3125], train_loss:0.155492 Epoch [1/2], Iter [1244/3125], train_loss:0.170940 Epoch [1/2], Iter [1245/3125], train_loss:0.166005 Epoch [1/2], Iter [1246/3125], train_loss:0.158416 Epoch [1/2], Iter [1247/3125], train_loss:0.154584 Epoch [1/2], Iter [1248/3125], train_loss:0.152003 Epoch [1/2], Iter [1249/3125], train_loss:0.168855 Epoch [1/2], Iter [1250/3125], train_loss:0.148871 Epoch [1/2], Iter [1251/3125], train_loss:0.175113 Epoch [1/2], Iter [1252/3125], train_loss:0.149920 Epoch [1/2], Iter [1253/3125], train_loss:0.151580 Epoch [1/2], Iter [1254/3125], train_loss:0.168768 Epoch [1/2], Iter [1255/3125], train_loss:0.166119 Epoch [1/2], Iter [1256/3125], train_loss:0.140963 Epoch [1/2], Iter [1257/3125], train_loss:0.168684 Epoch [1/2], Iter [1258/3125], train_loss:0.158394 Epoch [1/2], Iter [1259/3125], train_loss:0.161410 Epoch [1/2], Iter [1260/3125], train_loss:0.148364 Epoch [1/2], Iter [1261/3125], train_loss:0.165485 Epoch [1/2], Iter [1262/3125], train_loss:0.153689 Epoch [1/2], Iter [1263/3125], train_loss:0.171761 Epoch [1/2], Iter [1264/3125], train_loss:0.163797 Epoch [1/2], Iter [1265/3125], train_loss:0.146530 Epoch [1/2], Iter [1266/3125], train_loss:0.158110 Epoch [1/2], Iter [1267/3125], train_loss:0.160058 Epoch [1/2], Iter [1268/3125], train_loss:0.157368 Epoch [1/2], Iter [1269/3125], train_loss:0.151690 Epoch [1/2], Iter [1270/3125], train_loss:0.142817 Epoch [1/2], Iter [1271/3125], train_loss:0.153046 Epoch [1/2], Iter [1272/3125], train_loss:0.162205 Epoch [1/2], Iter [1273/3125], train_loss:0.179852 Epoch [1/2], Iter [1274/3125], train_loss:0.156627 Epoch [1/2], Iter [1275/3125], train_loss:0.158944 Epoch [1/2], Iter [1276/3125], train_loss:0.148821 Epoch [1/2], Iter [1277/3125], train_loss:0.157448 Epoch [1/2], Iter [1278/3125], train_loss:0.178943 Epoch [1/2], Iter [1279/3125], train_loss:0.170738 Epoch [1/2], Iter [1280/3125], train_loss:0.146238 Epoch [1/2], Iter [1281/3125], train_loss:0.166454 Epoch [1/2], Iter [1282/3125], train_loss:0.147360 Epoch [1/2], Iter [1283/3125], train_loss:0.166235 Epoch [1/2], Iter [1284/3125], train_loss:0.160503 Epoch [1/2], Iter [1285/3125], train_loss:0.155493 Epoch [1/2], Iter [1286/3125], train_loss:0.164259 Epoch [1/2], Iter [1287/3125], train_loss:0.159880 Epoch [1/2], Iter [1288/3125], train_loss:0.174088 Epoch [1/2], Iter [1289/3125], train_loss:0.158363 Epoch [1/2], Iter [1290/3125], train_loss:0.160815 Epoch [1/2], Iter [1291/3125], train_loss:0.168558 Epoch [1/2], Iter [1292/3125], train_loss:0.155379 Epoch [1/2], Iter [1293/3125], train_loss:0.158657 Epoch [1/2], Iter [1294/3125], train_loss:0.152092 Epoch [1/2], Iter [1295/3125], train_loss:0.151002 Epoch [1/2], Iter [1296/3125], train_loss:0.177545 Epoch [1/2], Iter [1297/3125], train_loss:0.155996 Epoch [1/2], Iter [1298/3125], train_loss:0.153431 Epoch [1/2], Iter [1299/3125], train_loss:0.160402 Epoch [1/2], Iter [1300/3125], train_loss:0.164605 Epoch [1/2], Iter [1301/3125], train_loss:0.181966 Epoch [1/2], Iter [1302/3125], train_loss:0.150270 Epoch [1/2], Iter [1303/3125], train_loss:0.153899 Epoch [1/2], Iter [1304/3125], train_loss:0.167255 Epoch [1/2], Iter [1305/3125], train_loss:0.164807 Epoch [1/2], Iter [1306/3125], train_loss:0.176301 Epoch [1/2], Iter [1307/3125], train_loss:0.155036 Epoch [1/2], Iter [1308/3125], train_loss:0.167926 Epoch [1/2], Iter [1309/3125], train_loss:0.176630 Epoch [1/2], Iter [1310/3125], train_loss:0.160102 Epoch [1/2], Iter [1311/3125], train_loss:0.173452 Epoch [1/2], Iter [1312/3125], train_loss:0.172366 Epoch [1/2], Iter [1313/3125], train_loss:0.156772 Epoch [1/2], Iter [1314/3125], train_loss:0.168792 Epoch [1/2], Iter [1315/3125], train_loss:0.178687 Epoch [1/2], Iter [1316/3125], train_loss:0.181647 Epoch [1/2], Iter [1317/3125], train_loss:0.154158 Epoch [1/2], Iter [1318/3125], train_loss:0.151710 Epoch [1/2], Iter [1319/3125], train_loss:0.183539 Epoch [1/2], Iter [1320/3125], train_loss:0.160138 Epoch [1/2], Iter [1321/3125], train_loss:0.177658 Epoch [1/2], Iter [1322/3125], train_loss:0.146497 Epoch [1/2], Iter [1323/3125], train_loss:0.196226 Epoch [1/2], Iter [1324/3125], train_loss:0.165244 Epoch [1/2], Iter [1325/3125], train_loss:0.197831 Epoch [1/2], Iter [1326/3125], train_loss:0.175092 Epoch [1/2], Iter [1327/3125], train_loss:0.184453 Epoch [1/2], Iter [1328/3125], train_loss:0.165453 Epoch [1/2], Iter [1329/3125], train_loss:0.145549 Epoch [1/2], Iter [1330/3125], train_loss:0.173061 Epoch [1/2], Iter [1331/3125], train_loss:0.166073 Epoch [1/2], Iter [1332/3125], train_loss:0.156471 Epoch [1/2], Iter [1333/3125], train_loss:0.152220 Epoch [1/2], Iter [1334/3125], train_loss:0.156158 Epoch [1/2], Iter [1335/3125], train_loss:0.165017 Epoch [1/2], Iter [1336/3125], train_loss:0.183256 Epoch [1/2], Iter [1337/3125], train_loss:0.167704 Epoch [1/2], Iter [1338/3125], train_loss:0.154254 Epoch [1/2], Iter [1339/3125], train_loss:0.162098 Epoch [1/2], Iter [1340/3125], train_loss:0.161697 Epoch [1/2], Iter [1341/3125], train_loss:0.164405 Epoch [1/2], Iter [1342/3125], train_loss:0.149967 Epoch [1/2], Iter [1343/3125], train_loss:0.171982 Epoch [1/2], Iter [1344/3125], train_loss:0.155723 Epoch [1/2], Iter [1345/3125], train_loss:0.147691 Epoch [1/2], Iter [1346/3125], train_loss:0.160214 Epoch [1/2], Iter [1347/3125], train_loss:0.154677 Epoch [1/2], Iter [1348/3125], train_loss:0.152759 Epoch [1/2], Iter [1349/3125], train_loss:0.166476 Epoch [1/2], Iter [1350/3125], train_loss:0.163566 Epoch [1/2], Iter [1351/3125], train_loss:0.150434 Epoch [1/2], Iter [1352/3125], train_loss:0.168793 Epoch [1/2], Iter [1353/3125], train_loss:0.162513 Epoch [1/2], Iter [1354/3125], train_loss:0.169711 Epoch [1/2], Iter [1355/3125], train_loss:0.158046 Epoch [1/2], Iter [1356/3125], train_loss:0.151754 Epoch [1/2], Iter [1357/3125], train_loss:0.170661 Epoch [1/2], Iter [1358/3125], train_loss:0.152679 Epoch [1/2], Iter [1359/3125], train_loss:0.167173 Epoch [1/2], Iter [1360/3125], train_loss:0.156606 Epoch [1/2], Iter [1361/3125], train_loss:0.183170 Epoch [1/2], Iter [1362/3125], train_loss:0.142545 Epoch [1/2], Iter [1363/3125], train_loss:0.159119 Epoch [1/2], Iter [1364/3125], train_loss:0.164405 Epoch [1/2], Iter [1365/3125], train_loss:0.159609 Epoch [1/2], Iter [1366/3125], train_loss:0.161490 Epoch [1/2], Iter [1367/3125], train_loss:0.167248 Epoch [1/2], Iter [1368/3125], train_loss:0.165266 Epoch [1/2], Iter [1369/3125], train_loss:0.164672 Epoch [1/2], Iter [1370/3125], train_loss:0.178968 Epoch [1/2], Iter [1371/3125], train_loss:0.139022 Epoch [1/2], Iter [1372/3125], train_loss:0.157129 Epoch [1/2], Iter [1373/3125], train_loss:0.170236 Epoch [1/2], Iter [1374/3125], train_loss:0.172654 Epoch [1/2], Iter [1375/3125], train_loss:0.154364 Epoch [1/2], Iter [1376/3125], train_loss:0.191031 Epoch [1/2], Iter [1377/3125], train_loss:0.154899 Epoch [1/2], Iter [1378/3125], train_loss:0.154030 Epoch [1/2], Iter [1379/3125], train_loss:0.164986 Epoch [1/2], Iter [1380/3125], train_loss:0.149888 Epoch [1/2], Iter [1381/3125], train_loss:0.161112 Epoch [1/2], Iter [1382/3125], train_loss:0.177446 Epoch [1/2], Iter [1383/3125], train_loss:0.181748 Epoch [1/2], Iter [1384/3125], train_loss:0.148632 Epoch [1/2], Iter [1385/3125], train_loss:0.171001 Epoch [1/2], Iter [1386/3125], train_loss:0.146871 Epoch [1/2], Iter [1387/3125], train_loss:0.152815 Epoch [1/2], Iter [1388/3125], train_loss:0.153880 Epoch [1/2], Iter [1389/3125], train_loss:0.167807 Epoch [1/2], Iter [1390/3125], train_loss:0.163647 Epoch [1/2], Iter [1391/3125], train_loss:0.159752 Epoch [1/2], Iter [1392/3125], train_loss:0.148706 Epoch [1/2], Iter [1393/3125], train_loss:0.145519 Epoch [1/2], Iter [1394/3125], train_loss:0.149635 Epoch [1/2], Iter [1395/3125], train_loss:0.156076 Epoch [1/2], Iter [1396/3125], train_loss:0.156446 Epoch [1/2], Iter [1397/3125], train_loss:0.165825 Epoch [1/2], Iter [1398/3125], train_loss:0.145675 Epoch [1/2], Iter [1399/3125], train_loss:0.142919 Epoch [1/2], Iter [1400/3125], train_loss:0.163214 Epoch [1/2], Iter [1401/3125], train_loss:0.155447 Epoch [1/2], Iter [1402/3125], train_loss:0.164264 Epoch [1/2], Iter [1403/3125], train_loss:0.168581 Epoch [1/2], Iter [1404/3125], train_loss:0.149629 Epoch [1/2], Iter [1405/3125], train_loss:0.164211 Epoch [1/2], Iter [1406/3125], train_loss:0.168869 Epoch [1/2], Iter [1407/3125], train_loss:0.153973 Epoch [1/2], Iter [1408/3125], train_loss:0.173186 Epoch [1/2], Iter [1409/3125], train_loss:0.174420 Epoch [1/2], Iter [1410/3125], train_loss:0.154398 Epoch [1/2], Iter [1411/3125], train_loss:0.147271 Epoch [1/2], Iter [1412/3125], train_loss:0.172150 Epoch [1/2], Iter [1413/3125], train_loss:0.144826 Epoch [1/2], Iter [1414/3125], train_loss:0.160246 Epoch [1/2], Iter [1415/3125], train_loss:0.166100 Epoch [1/2], Iter [1416/3125], train_loss:0.151365 Epoch [1/2], Iter [1417/3125], train_loss:0.144425 Epoch [1/2], Iter [1418/3125], train_loss:0.145632 Epoch [1/2], Iter [1419/3125], train_loss:0.167284 Epoch [1/2], Iter [1420/3125], train_loss:0.162896 Epoch [1/2], Iter [1421/3125], train_loss:0.168027 Epoch [1/2], Iter [1422/3125], train_loss:0.160721 Epoch [1/2], Iter [1423/3125], train_loss:0.164672 Epoch [1/2], Iter [1424/3125], train_loss:0.162177 Epoch [1/2], Iter [1425/3125], train_loss:0.166051 Epoch [1/2], Iter [1426/3125], train_loss:0.146067 Epoch [1/2], Iter [1427/3125], train_loss:0.159922 Epoch [1/2], Iter [1428/3125], train_loss:0.152642 Epoch [1/2], Iter [1429/3125], train_loss:0.148200 Epoch [1/2], Iter [1430/3125], train_loss:0.158262 Epoch [1/2], Iter [1431/3125], train_loss:0.149659 Epoch [1/2], Iter [1432/3125], train_loss:0.163230 Epoch [1/2], Iter [1433/3125], train_loss:0.145847 Epoch [1/2], Iter [1434/3125], train_loss:0.173391 Epoch [1/2], Iter [1435/3125], train_loss:0.125152 Epoch [1/2], Iter [1436/3125], train_loss:0.156048 Epoch [1/2], Iter [1437/3125], train_loss:0.157051 Epoch [1/2], Iter [1438/3125], train_loss:0.158350 Epoch [1/2], Iter [1439/3125], train_loss:0.192877 Epoch [1/2], Iter [1440/3125], train_loss:0.167545 Epoch [1/2], Iter [1441/3125], train_loss:0.197214 Epoch [1/2], Iter [1442/3125], train_loss:0.192135 Epoch [1/2], Iter [1443/3125], train_loss:0.172984 Epoch [1/2], Iter [1444/3125], train_loss:0.173254 Epoch [1/2], Iter [1445/3125], train_loss:0.144135 Epoch [1/2], Iter [1446/3125], train_loss:0.169613 Epoch [1/2], Iter [1447/3125], train_loss:0.167091 Epoch [1/2], Iter [1448/3125], train_loss:0.158223 Epoch [1/2], Iter [1449/3125], train_loss:0.172568 Epoch [1/2], Iter [1450/3125], train_loss:0.162713 Epoch [1/2], Iter [1451/3125], train_loss:0.168839 Epoch [1/2], Iter [1452/3125], train_loss:0.168881 Epoch [1/2], Iter [1453/3125], train_loss:0.166082 Epoch [1/2], Iter [1454/3125], train_loss:0.137113 Epoch [1/2], Iter [1455/3125], train_loss:0.156944 Epoch [1/2], Iter [1456/3125], train_loss:0.176010 Epoch [1/2], Iter [1457/3125], train_loss:0.165683 Epoch [1/2], Iter [1458/3125], train_loss:0.166721 Epoch [1/2], Iter [1459/3125], train_loss:0.177907 Epoch [1/2], Iter [1460/3125], train_loss:0.148519 Epoch [1/2], Iter [1461/3125], train_loss:0.178192 Epoch [1/2], Iter [1462/3125], train_loss:0.163624 Epoch [1/2], Iter [1463/3125], train_loss:0.160104 Epoch [1/2], Iter [1464/3125], train_loss:0.175325 Epoch [1/2], Iter [1465/3125], train_loss:0.173073 Epoch [1/2], Iter [1466/3125], train_loss:0.167916 Epoch [1/2], Iter [1467/3125], train_loss:0.161516 Epoch [1/2], Iter [1468/3125], train_loss:0.168107 Epoch [1/2], Iter [1469/3125], train_loss:0.169824 Epoch [1/2], Iter [1470/3125], train_loss:0.160803 Epoch [1/2], Iter [1471/3125], train_loss:0.170264 Epoch [1/2], Iter [1472/3125], train_loss:0.168911 Epoch [1/2], Iter [1473/3125], train_loss:0.143244 Epoch [1/2], Iter [1474/3125], train_loss:0.154688 Epoch [1/2], Iter [1475/3125], train_loss:0.152704 Epoch [1/2], Iter [1476/3125], train_loss:0.153546 Epoch [1/2], Iter [1477/3125], train_loss:0.180169 Epoch [1/2], Iter [1478/3125], train_loss:0.150831 Epoch [1/2], Iter [1479/3125], train_loss:0.171316 Epoch [1/2], Iter [1480/3125], train_loss:0.168213 Epoch [1/2], Iter [1481/3125], train_loss:0.172205 Epoch [1/2], Iter [1482/3125], train_loss:0.142973 Epoch [1/2], Iter [1483/3125], train_loss:0.157204 Epoch [1/2], Iter [1484/3125], train_loss:0.172524 Epoch [1/2], Iter [1485/3125], train_loss:0.157539 Epoch [1/2], Iter [1486/3125], train_loss:0.143420 Epoch [1/2], Iter [1487/3125], train_loss:0.162053 Epoch [1/2], Iter [1488/3125], train_loss:0.167251 Epoch [1/2], Iter [1489/3125], train_loss:0.172743 Epoch [1/2], Iter [1490/3125], train_loss:0.166958 Epoch [1/2], Iter [1491/3125], train_loss:0.168802 Epoch [1/2], Iter [1492/3125], train_loss:0.160231 Epoch [1/2], Iter [1493/3125], train_loss:0.171343 Epoch [1/2], Iter [1494/3125], train_loss:0.167754 Epoch [1/2], Iter [1495/3125], train_loss:0.166312 Epoch [1/2], Iter [1496/3125], train_loss:0.162917 Epoch [1/2], Iter [1497/3125], train_loss:0.162183 Epoch [1/2], Iter [1498/3125], train_loss:0.170274 Epoch [1/2], Iter [1499/3125], train_loss:0.177937 Epoch [1/2], Iter [1500/3125], train_loss:0.142511 Epoch [1/2], Iter [1501/3125], train_loss:0.146676 Epoch [1/2], Iter [1502/3125], train_loss:0.165919 Epoch [1/2], Iter [1503/3125], train_loss:0.153276 Epoch [1/2], Iter [1504/3125], train_loss:0.169737 Epoch [1/2], Iter [1505/3125], train_loss:0.155799 Epoch [1/2], Iter [1506/3125], train_loss:0.160062 Epoch [1/2], Iter [1507/3125], train_loss:0.156737 Epoch [1/2], Iter [1508/3125], train_loss:0.171055 Epoch [1/2], Iter [1509/3125], train_loss:0.155235 Epoch [1/2], Iter [1510/3125], train_loss:0.144856 Epoch [1/2], Iter [1511/3125], train_loss:0.154941 Epoch [1/2], Iter [1512/3125], train_loss:0.141613 Epoch [1/2], Iter [1513/3125], train_loss:0.169685 Epoch [1/2], Iter [1514/3125], train_loss:0.153574 Epoch [1/2], Iter [1515/3125], train_loss:0.165675 Epoch [1/2], Iter [1516/3125], train_loss:0.194039 Epoch [1/2], Iter [1517/3125], train_loss:0.136731 Epoch [1/2], Iter [1518/3125], train_loss:0.162655 Epoch [1/2], Iter [1519/3125], train_loss:0.157449 Epoch [1/2], Iter [1520/3125], train_loss:0.172672 Epoch [1/2], Iter [1521/3125], train_loss:0.185573 Epoch [1/2], Iter [1522/3125], train_loss:0.177209 Epoch [1/2], Iter [1523/3125], train_loss:0.144910 Epoch [1/2], Iter [1524/3125], train_loss:0.160207 Epoch [1/2], Iter [1525/3125], train_loss:0.163809 Epoch [1/2], Iter [1526/3125], train_loss:0.161429 Epoch [1/2], Iter [1527/3125], train_loss:0.149817 Epoch [1/2], Iter [1528/3125], train_loss:0.182072 Epoch [1/2], Iter [1529/3125], train_loss:0.175234 Epoch [1/2], Iter [1530/3125], train_loss:0.170426 Epoch [1/2], Iter [1531/3125], train_loss:0.155083 Epoch [1/2], Iter [1532/3125], train_loss:0.180693 Epoch [1/2], Iter [1533/3125], train_loss:0.168464 Epoch [1/2], Iter [1534/3125], train_loss:0.159503 Epoch [1/2], Iter [1535/3125], train_loss:0.161271 Epoch [1/2], Iter [1536/3125], train_loss:0.141511 Epoch [1/2], Iter [1537/3125], train_loss:0.166983 Epoch [1/2], Iter [1538/3125], train_loss:0.146959 Epoch [1/2], Iter [1539/3125], train_loss:0.161491 Epoch [1/2], Iter [1540/3125], train_loss:0.166418 Epoch [1/2], Iter [1541/3125], train_loss:0.158611 Epoch [1/2], Iter [1542/3125], train_loss:0.152136 Epoch [1/2], Iter [1543/3125], train_loss:0.172768 Epoch [1/2], Iter [1544/3125], train_loss:0.152095 Epoch [1/2], Iter [1545/3125], train_loss:0.142310 Epoch [1/2], Iter [1546/3125], train_loss:0.150826 Epoch [1/2], Iter [1547/3125], train_loss:0.147781 Epoch [1/2], Iter [1548/3125], train_loss:0.171356 Epoch [1/2], Iter [1549/3125], train_loss:0.149514 Epoch [1/2], Iter [1550/3125], train_loss:0.156635 Epoch [1/2], Iter [1551/3125], train_loss:0.172591 Epoch [1/2], Iter [1552/3125], train_loss:0.178937 Epoch [1/2], Iter [1553/3125], train_loss:0.165982 Epoch [1/2], Iter [1554/3125], train_loss:0.169758 Epoch [1/2], Iter [1555/3125], train_loss:0.170617 Epoch [1/2], Iter [1556/3125], train_loss:0.168998 Epoch [1/2], Iter [1557/3125], train_loss:0.165370 Epoch [1/2], Iter [1558/3125], train_loss:0.161071 Epoch [1/2], Iter [1559/3125], train_loss:0.158345 Epoch [1/2], Iter [1560/3125], train_loss:0.186070 Epoch [1/2], Iter [1561/3125], train_loss:0.160171 Epoch [1/2], Iter [1562/3125], train_loss:0.179954 Epoch [1/2], Iter [1563/3125], train_loss:0.172075 Epoch [1/2], Iter [1564/3125], train_loss:0.151861 Epoch [1/2], Iter [1565/3125], train_loss:0.173611 Epoch [1/2], Iter [1566/3125], train_loss:0.167420 Epoch [1/2], Iter [1567/3125], train_loss:0.149860 Epoch [1/2], Iter [1568/3125], train_loss:0.154862 Epoch [1/2], Iter [1569/3125], train_loss:0.168478 Epoch [1/2], Iter [1570/3125], train_loss:0.159789 Epoch [1/2], Iter [1571/3125], train_loss:0.145618 Epoch [1/2], Iter [1572/3125], train_loss:0.171799 Epoch [1/2], Iter [1573/3125], train_loss:0.146416 Epoch [1/2], Iter [1574/3125], train_loss:0.151151 Epoch [1/2], Iter [1575/3125], train_loss:0.179502 Epoch [1/2], Iter [1576/3125], train_loss:0.169216 Epoch [1/2], Iter [1577/3125], train_loss:0.162661 Epoch [1/2], Iter [1578/3125], train_loss:0.155157 Epoch [1/2], Iter [1579/3125], train_loss:0.155743 Epoch [1/2], Iter [1580/3125], train_loss:0.189926 Epoch [1/2], Iter [1581/3125], train_loss:0.164597 Epoch [1/2], Iter [1582/3125], train_loss:0.137085 Epoch [1/2], Iter [1583/3125], train_loss:0.160498 Epoch [1/2], Iter [1584/3125], train_loss:0.179899 Epoch [1/2], Iter [1585/3125], train_loss:0.145678 Epoch [1/2], Iter [1586/3125], train_loss:0.157172 Epoch [1/2], Iter [1587/3125], train_loss:0.168524 Epoch [1/2], Iter [1588/3125], train_loss:0.156430 Epoch [1/2], Iter [1589/3125], train_loss:0.133350 Epoch [1/2], Iter [1590/3125], train_loss:0.176202 Epoch [1/2], Iter [1591/3125], train_loss:0.169190 Epoch [1/2], Iter [1592/3125], train_loss:0.175895 Epoch [1/2], Iter [1593/3125], train_loss:0.174856 Epoch [1/2], Iter [1594/3125], train_loss:0.163053 Epoch [1/2], Iter [1595/3125], train_loss:0.181598 Epoch [1/2], Iter [1596/3125], train_loss:0.144885 Epoch [1/2], Iter [1597/3125], train_loss:0.170181 Epoch [1/2], Iter [1598/3125], train_loss:0.171142 Epoch [1/2], Iter [1599/3125], train_loss:0.141454 Epoch [1/2], Iter [1600/3125], train_loss:0.145220 Epoch [1/2], Iter [1601/3125], train_loss:0.141769 Epoch [1/2], Iter [1602/3125], train_loss:0.154418 Epoch [1/2], Iter [1603/3125], train_loss:0.160915 Epoch [1/2], Iter [1604/3125], train_loss:0.168919 Epoch [1/2], Iter [1605/3125], train_loss:0.180826 Epoch [1/2], Iter [1606/3125], train_loss:0.152858 Epoch [1/2], Iter [1607/3125], train_loss:0.164291 Epoch [1/2], Iter [1608/3125], train_loss:0.163692 Epoch [1/2], Iter [1609/3125], train_loss:0.147713 Epoch [1/2], Iter [1610/3125], train_loss:0.157055 Epoch [1/2], Iter [1611/3125], train_loss:0.194847 Epoch [1/2], Iter [1612/3125], train_loss:0.158709 Epoch [1/2], Iter [1613/3125], train_loss:0.166231 Epoch [1/2], Iter [1614/3125], train_loss:0.186130 Epoch [1/2], Iter [1615/3125], train_loss:0.192118 Epoch [1/2], Iter [1616/3125], train_loss:0.167463 Epoch [1/2], Iter [1617/3125], train_loss:0.137858 Epoch [1/2], Iter [1618/3125], train_loss:0.146418 Epoch [1/2], Iter [1619/3125], train_loss:0.173478 Epoch [1/2], Iter [1620/3125], train_loss:0.161725 Epoch [1/2], Iter [1621/3125], train_loss:0.152998 Epoch [1/2], Iter [1622/3125], train_loss:0.185276 Epoch [1/2], Iter [1623/3125], train_loss:0.152323 Epoch [1/2], Iter [1624/3125], train_loss:0.145322 Epoch [1/2], Iter [1625/3125], train_loss:0.161513 Epoch [1/2], Iter [1626/3125], train_loss:0.153024 Epoch [1/2], Iter [1627/3125], train_loss:0.164977 Epoch [1/2], Iter [1628/3125], train_loss:0.165822 Epoch [1/2], Iter [1629/3125], train_loss:0.151458 Epoch [1/2], Iter [1630/3125], train_loss:0.168540 Epoch [1/2], Iter [1631/3125], train_loss:0.181315 Epoch [1/2], Iter [1632/3125], train_loss:0.169901 Epoch [1/2], Iter [1633/3125], train_loss:0.169646 Epoch [1/2], Iter [1634/3125], train_loss:0.161900 Epoch [1/2], Iter [1635/3125], train_loss:0.141304 Epoch [1/2], Iter [1636/3125], train_loss:0.144623 Epoch [1/2], Iter [1637/3125], train_loss:0.156594 Epoch [1/2], Iter [1638/3125], train_loss:0.150709 Epoch [1/2], Iter [1639/3125], train_loss:0.137099 Epoch [1/2], Iter [1640/3125], train_loss:0.153333 Epoch [1/2], Iter [1641/3125], train_loss:0.157802 Epoch [1/2], Iter [1642/3125], train_loss:0.143059 Epoch [1/2], Iter [1643/3125], train_loss:0.189253 Epoch [1/2], Iter [1644/3125], train_loss:0.155171 Epoch [1/2], Iter [1645/3125], train_loss:0.152370 Epoch [1/2], Iter [1646/3125], train_loss:0.166632 Epoch [1/2], Iter [1647/3125], train_loss:0.179730 Epoch [1/2], Iter [1648/3125], train_loss:0.172416 Epoch [1/2], Iter [1649/3125], train_loss:0.178696 Epoch [1/2], Iter [1650/3125], train_loss:0.160341 Epoch [1/2], Iter [1651/3125], train_loss:0.144308 Epoch [1/2], Iter [1652/3125], train_loss:0.154410 Epoch [1/2], Iter [1653/3125], train_loss:0.173912 Epoch [1/2], Iter [1654/3125], train_loss:0.168114 Epoch [1/2], Iter [1655/3125], train_loss:0.161952 Epoch [1/2], Iter [1656/3125], train_loss:0.169061 Epoch [1/2], Iter [1657/3125], train_loss:0.159440 Epoch [1/2], Iter [1658/3125], train_loss:0.138091 Epoch [1/2], Iter [1659/3125], train_loss:0.155012 Epoch [1/2], Iter [1660/3125], train_loss:0.172804 Epoch [1/2], Iter [1661/3125], train_loss:0.143058 Epoch [1/2], Iter [1662/3125], train_loss:0.162421 Epoch [1/2], Iter [1663/3125], train_loss:0.143751 Epoch [1/2], Iter [1664/3125], train_loss:0.140241 Epoch [1/2], Iter [1665/3125], train_loss:0.172097 Epoch [1/2], Iter [1666/3125], train_loss:0.163913 Epoch [1/2], Iter [1667/3125], train_loss:0.165221 Epoch [1/2], Iter [1668/3125], train_loss:0.174985 Epoch [1/2], Iter [1669/3125], train_loss:0.157103 Epoch [1/2], Iter [1670/3125], train_loss:0.171044 Epoch [1/2], Iter [1671/3125], train_loss:0.179402 Epoch [1/2], Iter [1672/3125], train_loss:0.166310 Epoch [1/2], Iter [1673/3125], train_loss:0.161387 Epoch [1/2], Iter [1674/3125], train_loss:0.159246 Epoch [1/2], Iter [1675/3125], train_loss:0.159077 Epoch [1/2], Iter [1676/3125], train_loss:0.156243 Epoch [1/2], Iter [1677/3125], train_loss:0.177382 Epoch [1/2], Iter [1678/3125], train_loss:0.161576 Epoch [1/2], Iter [1679/3125], train_loss:0.155681 Epoch [1/2], Iter [1680/3125], train_loss:0.182101 Epoch [1/2], Iter [1681/3125], train_loss:0.168359 Epoch [1/2], Iter [1682/3125], train_loss:0.162136 Epoch [1/2], Iter [1683/3125], train_loss:0.175528 Epoch [1/2], Iter [1684/3125], train_loss:0.139192 Epoch [1/2], Iter [1685/3125], train_loss:0.149815 Epoch [1/2], Iter [1686/3125], train_loss:0.182981 Epoch [1/2], Iter [1687/3125], train_loss:0.160774 Epoch [1/2], Iter [1688/3125], train_loss:0.145968 Epoch [1/2], Iter [1689/3125], train_loss:0.158807 Epoch [1/2], Iter [1690/3125], train_loss:0.158910 Epoch [1/2], Iter [1691/3125], train_loss:0.174940 Epoch [1/2], Iter [1692/3125], train_loss:0.155379 Epoch [1/2], Iter [1693/3125], train_loss:0.170327 Epoch [1/2], Iter [1694/3125], train_loss:0.161909 Epoch [1/2], Iter [1695/3125], train_loss:0.150474 Epoch [1/2], Iter [1696/3125], train_loss:0.170937 Epoch [1/2], Iter [1697/3125], train_loss:0.152703 Epoch [1/2], Iter [1698/3125], train_loss:0.168881 Epoch [1/2], Iter [1699/3125], train_loss:0.172118 Epoch [1/2], Iter [1700/3125], train_loss:0.157837 Epoch [1/2], Iter [1701/3125], train_loss:0.160279 Epoch [1/2], Iter [1702/3125], train_loss:0.181616 Epoch [1/2], Iter [1703/3125], train_loss:0.147026 Epoch [1/2], Iter [1704/3125], train_loss:0.157656 Epoch [1/2], Iter [1705/3125], train_loss:0.179791 Epoch [1/2], Iter [1706/3125], train_loss:0.171684 Epoch [1/2], Iter [1707/3125], train_loss:0.138092 Epoch [1/2], Iter [1708/3125], train_loss:0.177978 Epoch [1/2], Iter [1709/3125], train_loss:0.175673 Epoch [1/2], Iter [1710/3125], train_loss:0.151395 Epoch [1/2], Iter [1711/3125], train_loss:0.159401 Epoch [1/2], Iter [1712/3125], train_loss:0.168381 Epoch [1/2], Iter [1713/3125], train_loss:0.166301 Epoch [1/2], Iter [1714/3125], train_loss:0.156766 Epoch [1/2], Iter [1715/3125], train_loss:0.168902 Epoch [1/2], Iter [1716/3125], train_loss:0.169495 Epoch [1/2], Iter [1717/3125], train_loss:0.159896 Epoch [1/2], Iter [1718/3125], train_loss:0.165244 Epoch [1/2], Iter [1719/3125], train_loss:0.145941 Epoch [1/2], Iter [1720/3125], train_loss:0.166384 Epoch [1/2], Iter [1721/3125], train_loss:0.174706 Epoch [1/2], Iter [1722/3125], train_loss:0.141559 Epoch [1/2], Iter [1723/3125], train_loss:0.174502 Epoch [1/2], Iter [1724/3125], train_loss:0.149242 Epoch [1/2], Iter [1725/3125], train_loss:0.143743 Epoch [1/2], Iter [1726/3125], train_loss:0.159536 Epoch [1/2], Iter [1727/3125], train_loss:0.173931 Epoch [1/2], Iter [1728/3125], train_loss:0.152694 Epoch [1/2], Iter [1729/3125], train_loss:0.167123 Epoch [1/2], Iter [1730/3125], train_loss:0.174618 Epoch [1/2], Iter [1731/3125], train_loss:0.187746 Epoch [1/2], Iter [1732/3125], train_loss:0.188053 Epoch [1/2], Iter [1733/3125], train_loss:0.161420 Epoch [1/2], Iter [1734/3125], train_loss:0.143247 Epoch [1/2], Iter [1735/3125], train_loss:0.164306 Epoch [1/2], Iter [1736/3125], train_loss:0.149670 Epoch [1/2], Iter [1737/3125], train_loss:0.180123 Epoch [1/2], Iter [1738/3125], train_loss:0.158347 Epoch [1/2], Iter [1739/3125], train_loss:0.175313 Epoch [1/2], Iter [1740/3125], train_loss:0.154087 Epoch [1/2], Iter [1741/3125], train_loss:0.180090 Epoch [1/2], Iter [1742/3125], train_loss:0.162183 Epoch [1/2], Iter [1743/3125], train_loss:0.164401 Epoch [1/2], Iter [1744/3125], train_loss:0.164047 Epoch [1/2], Iter [1745/3125], train_loss:0.164262 Epoch [1/2], Iter [1746/3125], train_loss:0.155122 Epoch [1/2], Iter [1747/3125], train_loss:0.164472 Epoch [1/2], Iter [1748/3125], train_loss:0.161191 Epoch [1/2], Iter [1749/3125], train_loss:0.160144 Epoch [1/2], Iter [1750/3125], train_loss:0.161302 Epoch [1/2], Iter [1751/3125], train_loss:0.160503 Epoch [1/2], Iter [1752/3125], train_loss:0.155728 Epoch [1/2], Iter [1753/3125], train_loss:0.140759 Epoch [1/2], Iter [1754/3125], train_loss:0.145626 Epoch [1/2], Iter [1755/3125], train_loss:0.164837 Epoch [1/2], Iter [1756/3125], train_loss:0.157151 Epoch [1/2], Iter [1757/3125], train_loss:0.179853 Epoch [1/2], Iter [1758/3125], train_loss:0.150290 Epoch [1/2], Iter [1759/3125], train_loss:0.144776 Epoch [1/2], Iter [1760/3125], train_loss:0.163736 Epoch [1/2], Iter [1761/3125], train_loss:0.161901 Epoch [1/2], Iter [1762/3125], train_loss:0.137393 Epoch [1/2], Iter [1763/3125], train_loss:0.157102 Epoch [1/2], Iter [1764/3125], train_loss:0.147714 Epoch [1/2], Iter [1765/3125], train_loss:0.168851 Epoch [1/2], Iter [1766/3125], train_loss:0.180556 Epoch [1/2], Iter [1767/3125], train_loss:0.145371 Epoch [1/2], Iter [1768/3125], train_loss:0.182627 Epoch [1/2], Iter [1769/3125], train_loss:0.155184 Epoch [1/2], Iter [1770/3125], train_loss:0.170251 Epoch [1/2], Iter [1771/3125], train_loss:0.156243 Epoch [1/2], Iter [1772/3125], train_loss:0.157432 Epoch [1/2], Iter [1773/3125], train_loss:0.165467 Epoch [1/2], Iter [1774/3125], train_loss:0.166601 Epoch [1/2], Iter [1775/3125], train_loss:0.177986 Epoch [1/2], Iter [1776/3125], train_loss:0.165314 Epoch [1/2], Iter [1777/3125], train_loss:0.171132 Epoch [1/2], Iter [1778/3125], train_loss:0.190048 Epoch [1/2], Iter [1779/3125], train_loss:0.165800 Epoch [1/2], Iter [1780/3125], train_loss:0.160303 Epoch [1/2], Iter [1781/3125], train_loss:0.155642 Epoch [1/2], Iter [1782/3125], train_loss:0.146157 Epoch [1/2], Iter [1783/3125], train_loss:0.160654 Epoch [1/2], Iter [1784/3125], train_loss:0.176773 Epoch [1/2], Iter [1785/3125], train_loss:0.169321 Epoch [1/2], Iter [1786/3125], train_loss:0.150362 Epoch [1/2], Iter [1787/3125], train_loss:0.167345 Epoch [1/2], Iter [1788/3125], train_loss:0.145898 Epoch [1/2], Iter [1789/3125], train_loss:0.150497 Epoch [1/2], Iter [1790/3125], train_loss:0.166425 Epoch [1/2], Iter [1791/3125], train_loss:0.171549 Epoch [1/2], Iter [1792/3125], train_loss:0.154176 Epoch [1/2], Iter [1793/3125], train_loss:0.166127 Epoch [1/2], Iter [1794/3125], train_loss:0.165764 Epoch [1/2], Iter [1795/3125], train_loss:0.162320 Epoch [1/2], Iter [1796/3125], train_loss:0.194941 Epoch [1/2], Iter [1797/3125], train_loss:0.157635 Epoch [1/2], Iter [1798/3125], train_loss:0.163641 Epoch [1/2], Iter [1799/3125], train_loss:0.167187 Epoch [1/2], Iter [1800/3125], train_loss:0.145284 Epoch [1/2], Iter [1801/3125], train_loss:0.161971 Epoch [1/2], Iter [1802/3125], train_loss:0.164058 Epoch [1/2], Iter [1803/3125], train_loss:0.142048 Epoch [1/2], Iter [1804/3125], train_loss:0.160427 Epoch [1/2], Iter [1805/3125], train_loss:0.161583 Epoch [1/2], Iter [1806/3125], train_loss:0.155377 Epoch [1/2], Iter [1807/3125], train_loss:0.191237 Epoch [1/2], Iter [1808/3125], train_loss:0.163925 Epoch [1/2], Iter [1809/3125], train_loss:0.190522 Epoch [1/2], Iter [1810/3125], train_loss:0.160186 Epoch [1/2], Iter [1811/3125], train_loss:0.176043 Epoch [1/2], Iter [1812/3125], train_loss:0.170882 Epoch [1/2], Iter [1813/3125], train_loss:0.168062 Epoch [1/2], Iter [1814/3125], train_loss:0.162766 Epoch [1/2], Iter [1815/3125], train_loss:0.168871 Epoch [1/2], Iter [1816/3125], train_loss:0.161266 Epoch [1/2], Iter [1817/3125], train_loss:0.155407 Epoch [1/2], Iter [1818/3125], train_loss:0.145036 Epoch [1/2], Iter [1819/3125], train_loss:0.166525 Epoch [1/2], Iter [1820/3125], train_loss:0.161894 Epoch [1/2], Iter [1821/3125], train_loss:0.179618 Epoch [1/2], Iter [1822/3125], train_loss:0.165360 Epoch [1/2], Iter [1823/3125], train_loss:0.179443 Epoch [1/2], Iter [1824/3125], train_loss:0.182535 Epoch [1/2], Iter [1825/3125], train_loss:0.183849 Epoch [1/2], Iter [1826/3125], train_loss:0.160797 Epoch [1/2], Iter [1827/3125], train_loss:0.155679 Epoch [1/2], Iter [1828/3125], train_loss:0.163237 Epoch [1/2], Iter [1829/3125], train_loss:0.155929 Epoch [1/2], Iter [1830/3125], train_loss:0.169370 Epoch [1/2], Iter [1831/3125], train_loss:0.181343 Epoch [1/2], Iter [1832/3125], train_loss:0.150245 Epoch [1/2], Iter [1833/3125], train_loss:0.173141 Epoch [1/2], Iter [1834/3125], train_loss:0.163475 Epoch [1/2], Iter [1835/3125], train_loss:0.163326 Epoch [1/2], Iter [1836/3125], train_loss:0.157031 Epoch [1/2], Iter [1837/3125], train_loss:0.166926 Epoch [1/2], Iter [1838/3125], train_loss:0.181456 Epoch [1/2], Iter [1839/3125], train_loss:0.166013 Epoch [1/2], Iter [1840/3125], train_loss:0.151755 Epoch [1/2], Iter [1841/3125], train_loss:0.172941 Epoch [1/2], Iter [1842/3125], train_loss:0.175330 Epoch [1/2], Iter [1843/3125], train_loss:0.147175 Epoch [1/2], Iter [1844/3125], train_loss:0.178653 Epoch [1/2], Iter [1845/3125], train_loss:0.152745 Epoch [1/2], Iter [1846/3125], train_loss:0.142007 Epoch [1/2], Iter [1847/3125], train_loss:0.148765 Epoch [1/2], Iter [1848/3125], train_loss:0.166682 Epoch [1/2], Iter [1849/3125], train_loss:0.156195 Epoch [1/2], Iter [1850/3125], train_loss:0.156262 Epoch [1/2], Iter [1851/3125], train_loss:0.160495 Epoch [1/2], Iter [1852/3125], train_loss:0.162996 Epoch [1/2], Iter [1853/3125], train_loss:0.162435 Epoch [1/2], Iter [1854/3125], train_loss:0.161346 Epoch [1/2], Iter [1855/3125], train_loss:0.171397 Epoch [1/2], Iter [1856/3125], train_loss:0.177385 Epoch [1/2], Iter [1857/3125], train_loss:0.130351 Epoch [1/2], Iter [1858/3125], train_loss:0.151499 Epoch [1/2], Iter [1859/3125], train_loss:0.149441 Epoch [1/2], Iter [1860/3125], train_loss:0.164405 Epoch [1/2], Iter [1861/3125], train_loss:0.163205 Epoch [1/2], Iter [1862/3125], train_loss:0.177325 Epoch [1/2], Iter [1863/3125], train_loss:0.155396 Epoch [1/2], Iter [1864/3125], train_loss:0.162093 Epoch [1/2], Iter [1865/3125], train_loss:0.155531 Epoch [1/2], Iter [1866/3125], train_loss:0.147332 Epoch [1/2], Iter [1867/3125], train_loss:0.150838 Epoch [1/2], Iter [1868/3125], train_loss:0.154241 Epoch [1/2], Iter [1869/3125], train_loss:0.142730 Epoch [1/2], Iter [1870/3125], train_loss:0.154984 Epoch [1/2], Iter [1871/3125], train_loss:0.141732 Epoch [1/2], Iter [1872/3125], train_loss:0.169305 Epoch [1/2], Iter [1873/3125], train_loss:0.158314 Epoch [1/2], Iter [1874/3125], train_loss:0.155676 Epoch [1/2], Iter [1875/3125], train_loss:0.162900 Epoch [1/2], Iter [1876/3125], train_loss:0.174867 Epoch [1/2], Iter [1877/3125], train_loss:0.165536 Epoch [1/2], Iter [1878/3125], train_loss:0.153315 Epoch [1/2], Iter [1879/3125], train_loss:0.150544 Epoch [1/2], Iter [1880/3125], train_loss:0.183719 Epoch [1/2], Iter [1881/3125], train_loss:0.160276 Epoch [1/2], Iter [1882/3125], train_loss:0.169111 Epoch [1/2], Iter [1883/3125], train_loss:0.162062 Epoch [1/2], Iter [1884/3125], train_loss:0.136829 Epoch [1/2], Iter [1885/3125], train_loss:0.152484 Epoch [1/2], Iter [1886/3125], train_loss:0.157395 Epoch [1/2], Iter [1887/3125], train_loss:0.153584 Epoch [1/2], Iter [1888/3125], train_loss:0.184233 Epoch [1/2], Iter [1889/3125], train_loss:0.158605 Epoch [1/2], Iter [1890/3125], train_loss:0.160698 Epoch [1/2], Iter [1891/3125], train_loss:0.160876 Epoch [1/2], Iter [1892/3125], train_loss:0.169683 Epoch [1/2], Iter [1893/3125], train_loss:0.157519 Epoch [1/2], Iter [1894/3125], train_loss:0.159938 Epoch [1/2], Iter [1895/3125], train_loss:0.170259 Epoch [1/2], Iter [1896/3125], train_loss:0.184046 Epoch [1/2], Iter [1897/3125], train_loss:0.153115 Epoch [1/2], Iter [1898/3125], train_loss:0.157097 Epoch [1/2], Iter [1899/3125], train_loss:0.162475 Epoch [1/2], Iter [1900/3125], train_loss:0.161365 Epoch [1/2], Iter [1901/3125], train_loss:0.173105 Epoch [1/2], Iter [1902/3125], train_loss:0.148295 Epoch [1/2], Iter [1903/3125], train_loss:0.165971 Epoch [1/2], Iter [1904/3125], train_loss:0.158941 Epoch [1/2], Iter [1905/3125], train_loss:0.167976 Epoch [1/2], Iter [1906/3125], train_loss:0.161314 Epoch [1/2], Iter [1907/3125], train_loss:0.142002 Epoch [1/2], Iter [1908/3125], train_loss:0.155992 Epoch [1/2], Iter [1909/3125], train_loss:0.159452 Epoch [1/2], Iter [1910/3125], train_loss:0.167375 Epoch [1/2], Iter [1911/3125], train_loss:0.160087 Epoch [1/2], Iter [1912/3125], train_loss:0.162730 Epoch [1/2], Iter [1913/3125], train_loss:0.166080 Epoch [1/2], Iter [1914/3125], train_loss:0.186217 Epoch [1/2], Iter [1915/3125], train_loss:0.151830 Epoch [1/2], Iter [1916/3125], train_loss:0.168950 Epoch [1/2], Iter [1917/3125], train_loss:0.153571 Epoch [1/2], Iter [1918/3125], train_loss:0.164015 Epoch [1/2], Iter [1919/3125], train_loss:0.159809 Epoch [1/2], Iter [1920/3125], train_loss:0.146458 Epoch [1/2], Iter [1921/3125], train_loss:0.160593 Epoch [1/2], Iter [1922/3125], train_loss:0.152458 Epoch [1/2], Iter [1923/3125], train_loss:0.170881 Epoch [1/2], Iter [1924/3125], train_loss:0.158566 Epoch [1/2], Iter [1925/3125], train_loss:0.155870 Epoch [1/2], Iter [1926/3125], train_loss:0.188001 Epoch [1/2], Iter [1927/3125], train_loss:0.169803 Epoch [1/2], Iter [1928/3125], train_loss:0.150111 Epoch [1/2], Iter [1929/3125], train_loss:0.163295 Epoch [1/2], Iter [1930/3125], train_loss:0.145743 Epoch [1/2], Iter [1931/3125], train_loss:0.154151 Epoch [1/2], Iter [1932/3125], train_loss:0.160207 Epoch [1/2], Iter [1933/3125], train_loss:0.158596 Epoch [1/2], Iter [1934/3125], train_loss:0.173918 Epoch [1/2], Iter [1935/3125], train_loss:0.184682 Epoch [1/2], Iter [1936/3125], train_loss:0.184060 Epoch [1/2], Iter [1937/3125], train_loss:0.165681 Epoch [1/2], Iter [1938/3125], train_loss:0.172499 Epoch [1/2], Iter [1939/3125], train_loss:0.154547 Epoch [1/2], Iter [1940/3125], train_loss:0.147814 Epoch [1/2], Iter [1941/3125], train_loss:0.161590 Epoch [1/2], Iter [1942/3125], train_loss:0.141166 Epoch [1/2], Iter [1943/3125], train_loss:0.151419 Epoch [1/2], Iter [1944/3125], train_loss:0.152621 Epoch [1/2], Iter [1945/3125], train_loss:0.164691 Epoch [1/2], Iter [1946/3125], train_loss:0.146373 Epoch [1/2], Iter [1947/3125], train_loss:0.148736 Epoch [1/2], Iter [1948/3125], train_loss:0.199945 Epoch [1/2], Iter [1949/3125], train_loss:0.154226 Epoch [1/2], Iter [1950/3125], train_loss:0.173260 Epoch [1/2], Iter [1951/3125], train_loss:0.161090 Epoch [1/2], Iter [1952/3125], train_loss:0.169187 Epoch [1/2], Iter [1953/3125], train_loss:0.164385 Epoch [1/2], Iter [1954/3125], train_loss:0.151662 Epoch [1/2], Iter [1955/3125], train_loss:0.165125 Epoch [1/2], Iter [1956/3125], train_loss:0.154994 Epoch [1/2], Iter [1957/3125], train_loss:0.173068 Epoch [1/2], Iter [1958/3125], train_loss:0.175447 Epoch [1/2], Iter [1959/3125], train_loss:0.170935 Epoch [1/2], Iter [1960/3125], train_loss:0.167173 Epoch [1/2], Iter [1961/3125], train_loss:0.181798 Epoch [1/2], Iter [1962/3125], train_loss:0.149530 Epoch [1/2], Iter [1963/3125], train_loss:0.162909 Epoch [1/2], Iter [1964/3125], train_loss:0.159980 Epoch [1/2], Iter [1965/3125], train_loss:0.152192 Epoch [1/2], Iter [1966/3125], train_loss:0.178360 Epoch [1/2], Iter [1967/3125], train_loss:0.146795 Epoch [1/2], Iter [1968/3125], train_loss:0.143748 Epoch [1/2], Iter [1969/3125], train_loss:0.181524 Epoch [1/2], Iter [1970/3125], train_loss:0.156961 Epoch [1/2], Iter [1971/3125], train_loss:0.157196 Epoch [1/2], Iter [1972/3125], train_loss:0.156264 Epoch [1/2], Iter [1973/3125], train_loss:0.155585 Epoch [1/2], Iter [1974/3125], train_loss:0.165857 Epoch [1/2], Iter [1975/3125], train_loss:0.179051 Epoch [1/2], Iter [1976/3125], train_loss:0.154581 Epoch [1/2], Iter [1977/3125], train_loss:0.169368 Epoch [1/2], Iter [1978/3125], train_loss:0.144383 Epoch [1/2], Iter [1979/3125], train_loss:0.152714 Epoch [1/2], Iter [1980/3125], train_loss:0.148939 Epoch [1/2], Iter [1981/3125], train_loss:0.175638 Epoch [1/2], Iter [1982/3125], train_loss:0.168751 Epoch [1/2], Iter [1983/3125], train_loss:0.162145 Epoch [1/2], Iter [1984/3125], train_loss:0.182724 Epoch [1/2], Iter [1985/3125], train_loss:0.155977 Epoch [1/2], Iter [1986/3125], train_loss:0.155561 Epoch [1/2], Iter [1987/3125], train_loss:0.191799 Epoch [1/2], Iter [1988/3125], train_loss:0.167943 Epoch [1/2], Iter [1989/3125], train_loss:0.163147 Epoch [1/2], Iter [1990/3125], train_loss:0.176683 Epoch [1/2], Iter [1991/3125], train_loss:0.158023 Epoch [1/2], Iter [1992/3125], train_loss:0.160804 Epoch [1/2], Iter [1993/3125], train_loss:0.158202 Epoch [1/2], Iter [1994/3125], train_loss:0.170246 Epoch [1/2], Iter [1995/3125], train_loss:0.165867 Epoch [1/2], Iter [1996/3125], train_loss:0.144000 Epoch [1/2], Iter [1997/3125], train_loss:0.162011 Epoch [1/2], Iter [1998/3125], train_loss:0.168511 Epoch [1/2], Iter [1999/3125], train_loss:0.156590 Epoch [1/2], Iter [2000/3125], train_loss:0.151475 Epoch [1/2], Iter [2001/3125], train_loss:0.172778 Epoch [1/2], Iter [2002/3125], train_loss:0.174170 Epoch [1/2], Iter [2003/3125], train_loss:0.172485 Epoch [1/2], Iter [2004/3125], train_loss:0.151658 Epoch [1/2], Iter [2005/3125], train_loss:0.165962 Epoch [1/2], Iter [2006/3125], train_loss:0.143169 Epoch [1/2], Iter [2007/3125], train_loss:0.170595 Epoch [1/2], Iter [2008/3125], train_loss:0.200333 Epoch [1/2], Iter [2009/3125], train_loss:0.163445 Epoch [1/2], Iter [2010/3125], train_loss:0.150004 Epoch [1/2], Iter [2011/3125], train_loss:0.157552 Epoch [1/2], Iter [2012/3125], train_loss:0.168187 Epoch [1/2], Iter [2013/3125], train_loss:0.153843 Epoch [1/2], Iter [2014/3125], train_loss:0.169956 Epoch [1/2], Iter [2015/3125], train_loss:0.171310 Epoch [1/2], Iter [2016/3125], train_loss:0.152466 Epoch [1/2], Iter [2017/3125], train_loss:0.173650 Epoch [1/2], Iter [2018/3125], train_loss:0.162068 Epoch [1/2], Iter [2019/3125], train_loss:0.130321 Epoch [1/2], Iter [2020/3125], train_loss:0.124815 Epoch [1/2], Iter [2021/3125], train_loss:0.145361 Epoch [1/2], Iter [2022/3125], train_loss:0.146677 Epoch [1/2], Iter [2023/3125], train_loss:0.165708 Epoch [1/2], Iter [2024/3125], train_loss:0.176067 Epoch [1/2], Iter [2025/3125], train_loss:0.168302 Epoch [1/2], Iter [2026/3125], train_loss:0.154875 Epoch [1/2], Iter [2027/3125], train_loss:0.164601 Epoch [1/2], Iter [2028/3125], train_loss:0.171662 Epoch [1/2], Iter [2029/3125], train_loss:0.151342 Epoch [1/2], Iter [2030/3125], train_loss:0.162905 Epoch [1/2], Iter [2031/3125], train_loss:0.155055 Epoch [1/2], Iter [2032/3125], train_loss:0.133530 Epoch [1/2], Iter [2033/3125], train_loss:0.145367 Epoch [1/2], Iter [2034/3125], train_loss:0.140172 Epoch [1/2], Iter [2035/3125], train_loss:0.166013 Epoch [1/2], Iter [2036/3125], train_loss:0.160138 Epoch [1/2], Iter [2037/3125], train_loss:0.149895 Epoch [1/2], Iter [2038/3125], train_loss:0.158618 Epoch [1/2], Iter [2039/3125], train_loss:0.182976 Epoch [1/2], Iter [2040/3125], train_loss:0.163650 Epoch [1/2], Iter [2041/3125], train_loss:0.156669 Epoch [1/2], Iter [2042/3125], train_loss:0.161290 Epoch [1/2], Iter [2043/3125], train_loss:0.162484 Epoch [1/2], Iter [2044/3125], train_loss:0.167114 Epoch [1/2], Iter [2045/3125], train_loss:0.170945 Epoch [1/2], Iter [2046/3125], train_loss:0.155789 Epoch [1/2], Iter [2047/3125], train_loss:0.164060 Epoch [1/2], Iter [2048/3125], train_loss:0.186333 Epoch [1/2], Iter [2049/3125], train_loss:0.160162 Epoch [1/2], Iter [2050/3125], train_loss:0.159823 Epoch [1/2], Iter [2051/3125], train_loss:0.158371 Epoch [1/2], Iter [2052/3125], train_loss:0.159072 Epoch [1/2], Iter [2053/3125], train_loss:0.173952 Epoch [1/2], Iter [2054/3125], train_loss:0.161498 Epoch [1/2], Iter [2055/3125], train_loss:0.147181 Epoch [1/2], Iter [2056/3125], train_loss:0.176381 Epoch [1/2], Iter [2057/3125], train_loss:0.133942 Epoch [1/2], Iter [2058/3125], train_loss:0.144585 Epoch [1/2], Iter [2059/3125], train_loss:0.162973 Epoch [1/2], Iter [2060/3125], train_loss:0.165241 Epoch [1/2], Iter [2061/3125], train_loss:0.174846 Epoch [1/2], Iter [2062/3125], train_loss:0.180418 Epoch [1/2], Iter [2063/3125], train_loss:0.182369 Epoch [1/2], Iter [2064/3125], train_loss:0.148855 Epoch [1/2], Iter [2065/3125], train_loss:0.180047 Epoch [1/2], Iter [2066/3125], train_loss:0.131797 Epoch [1/2], Iter [2067/3125], train_loss:0.161162 Epoch [1/2], Iter [2068/3125], train_loss:0.153623 Epoch [1/2], Iter [2069/3125], train_loss:0.177079 Epoch [1/2], Iter [2070/3125], train_loss:0.166252 Epoch [1/2], Iter [2071/3125], train_loss:0.167907 Epoch [1/2], Iter [2072/3125], train_loss:0.168837 Epoch [1/2], Iter [2073/3125], train_loss:0.178388 Epoch [1/2], Iter [2074/3125], train_loss:0.167471 Epoch [1/2], Iter [2075/3125], train_loss:0.182263 Epoch [1/2], Iter [2076/3125], train_loss:0.164933 Epoch [1/2], Iter [2077/3125], train_loss:0.159932 Epoch [1/2], Iter [2078/3125], train_loss:0.165879 Epoch [1/2], Iter [2079/3125], train_loss:0.168112 Epoch [1/2], Iter [2080/3125], train_loss:0.164080 Epoch [1/2], Iter [2081/3125], train_loss:0.177289 Epoch [1/2], Iter [2082/3125], train_loss:0.156052 Epoch [1/2], Iter [2083/3125], train_loss:0.150370 Epoch [1/2], Iter [2084/3125], train_loss:0.157389 Epoch [1/2], Iter [2085/3125], train_loss:0.164571 Epoch [1/2], Iter [2086/3125], train_loss:0.165030 Epoch [1/2], Iter [2087/3125], train_loss:0.165491 Epoch [1/2], Iter [2088/3125], train_loss:0.157076 Epoch [1/2], Iter [2089/3125], train_loss:0.157584 Epoch [1/2], Iter [2090/3125], train_loss:0.142475 Epoch [1/2], Iter [2091/3125], train_loss:0.161959 Epoch [1/2], Iter [2092/3125], train_loss:0.150067 Epoch [1/2], Iter [2093/3125], train_loss:0.169877 Epoch [1/2], Iter [2094/3125], train_loss:0.175256 Epoch [1/2], Iter [2095/3125], train_loss:0.150007 Epoch [1/2], Iter [2096/3125], train_loss:0.175035 Epoch [1/2], Iter [2097/3125], train_loss:0.143745 Epoch [1/2], Iter [2098/3125], train_loss:0.175930 Epoch [1/2], Iter [2099/3125], train_loss:0.148834 Epoch [1/2], Iter [2100/3125], train_loss:0.165045 Epoch [1/2], Iter [2101/3125], train_loss:0.142969 Epoch [1/2], Iter [2102/3125], train_loss:0.147515 Epoch [1/2], Iter [2103/3125], train_loss:0.144696 Epoch [1/2], Iter [2104/3125], train_loss:0.170307 Epoch [1/2], Iter [2105/3125], train_loss:0.153275 Epoch [1/2], Iter [2106/3125], train_loss:0.174566 Epoch [1/2], Iter [2107/3125], train_loss:0.168739 Epoch [1/2], Iter [2108/3125], train_loss:0.168275 Epoch [1/2], Iter [2109/3125], train_loss:0.156382 Epoch [1/2], Iter [2110/3125], train_loss:0.180446 Epoch [1/2], Iter [2111/3125], train_loss:0.168478 Epoch [1/2], Iter [2112/3125], train_loss:0.161389 Epoch [1/2], Iter [2113/3125], train_loss:0.166829 Epoch [1/2], Iter [2114/3125], train_loss:0.144316 Epoch [1/2], Iter [2115/3125], train_loss:0.180950 Epoch [1/2], Iter [2116/3125], train_loss:0.160766 Epoch [1/2], Iter [2117/3125], train_loss:0.134064 Epoch [1/2], Iter [2118/3125], train_loss:0.133301 Epoch [1/2], Iter [2119/3125], train_loss:0.156353 Epoch [1/2], Iter [2120/3125], train_loss:0.155335 Epoch [1/2], Iter [2121/3125], train_loss:0.156238 Epoch [1/2], Iter [2122/3125], train_loss:0.167666 Epoch [1/2], Iter [2123/3125], train_loss:0.139103 Epoch [1/2], Iter [2124/3125], train_loss:0.164919 Epoch [1/2], Iter [2125/3125], train_loss:0.172402 Epoch [1/2], Iter [2126/3125], train_loss:0.156982 Epoch [1/2], Iter [2127/3125], train_loss:0.174418 Epoch [1/2], Iter [2128/3125], train_loss:0.163564 Epoch [1/2], Iter [2129/3125], train_loss:0.155914 Epoch [1/2], Iter [2130/3125], train_loss:0.155929 Epoch [1/2], Iter [2131/3125], train_loss:0.164747 Epoch [1/2], Iter [2132/3125], train_loss:0.169162 Epoch [1/2], Iter [2133/3125], train_loss:0.176287 Epoch [1/2], Iter [2134/3125], train_loss:0.145140 Epoch [1/2], Iter [2135/3125], train_loss:0.172772 Epoch [1/2], Iter [2136/3125], train_loss:0.172442 Epoch [1/2], Iter [2137/3125], train_loss:0.166545 Epoch [1/2], Iter [2138/3125], train_loss:0.155658 Epoch [1/2], Iter [2139/3125], train_loss:0.144825 Epoch [1/2], Iter [2140/3125], train_loss:0.165197 Epoch [1/2], Iter [2141/3125], train_loss:0.179990 Epoch [1/2], Iter [2142/3125], train_loss:0.155233 Epoch [1/2], Iter [2143/3125], train_loss:0.162739 Epoch [1/2], Iter [2144/3125], train_loss:0.156480 Epoch [1/2], Iter [2145/3125], train_loss:0.155214 Epoch [1/2], Iter [2146/3125], train_loss:0.162011 Epoch [1/2], Iter [2147/3125], train_loss:0.163268 Epoch [1/2], Iter [2148/3125], train_loss:0.180236 Epoch [1/2], Iter [2149/3125], train_loss:0.173788 Epoch [1/2], Iter [2150/3125], train_loss:0.155130 Epoch [1/2], Iter [2151/3125], train_loss:0.165528 Epoch [1/2], Iter [2152/3125], train_loss:0.176281 Epoch [1/2], Iter [2153/3125], train_loss:0.151886 Epoch [1/2], Iter [2154/3125], train_loss:0.145217 Epoch [1/2], Iter [2155/3125], train_loss:0.162727 Epoch [1/2], Iter [2156/3125], train_loss:0.167274 Epoch [1/2], Iter [2157/3125], train_loss:0.192076 Epoch [1/2], Iter [2158/3125], train_loss:0.163333 Epoch [1/2], Iter [2159/3125], train_loss:0.162825 Epoch [1/2], Iter [2160/3125], train_loss:0.187321 Epoch [1/2], Iter [2161/3125], train_loss:0.158406 Epoch [1/2], Iter [2162/3125], train_loss:0.179870 Epoch [1/2], Iter [2163/3125], train_loss:0.138683 Epoch [1/2], Iter [2164/3125], train_loss:0.150967 Epoch [1/2], Iter [2165/3125], train_loss:0.131095 Epoch [1/2], Iter [2166/3125], train_loss:0.183638 Epoch [1/2], Iter [2167/3125], train_loss:0.159978 Epoch [1/2], Iter [2168/3125], train_loss:0.186739 Epoch [1/2], Iter [2169/3125], train_loss:0.181425 Epoch [1/2], Iter [2170/3125], train_loss:0.168422 Epoch [1/2], Iter [2171/3125], train_loss:0.167889 Epoch [1/2], Iter [2172/3125], train_loss:0.140880 Epoch [1/2], Iter [2173/3125], train_loss:0.181237 Epoch [1/2], Iter [2174/3125], train_loss:0.159820 Epoch [1/2], Iter [2175/3125], train_loss:0.157872 Epoch [1/2], Iter [2176/3125], train_loss:0.158261 Epoch [1/2], Iter [2177/3125], train_loss:0.142904 Epoch [1/2], Iter [2178/3125], train_loss:0.172463 Epoch [1/2], Iter [2179/3125], train_loss:0.149298 Epoch [1/2], Iter [2180/3125], train_loss:0.159796 Epoch [1/2], Iter [2181/3125], train_loss:0.169302 Epoch [1/2], Iter [2182/3125], train_loss:0.185946 Epoch [1/2], Iter [2183/3125], train_loss:0.158634 Epoch [1/2], Iter [2184/3125], train_loss:0.163283 Epoch [1/2], Iter [2185/3125], train_loss:0.147155 Epoch [1/2], Iter [2186/3125], train_loss:0.169214 Epoch [1/2], Iter [2187/3125], train_loss:0.170750 Epoch [1/2], Iter [2188/3125], train_loss:0.164639 Epoch [1/2], Iter [2189/3125], train_loss:0.167140 Epoch [1/2], Iter [2190/3125], train_loss:0.165556 Epoch [1/2], Iter [2191/3125], train_loss:0.173164 Epoch [1/2], Iter [2192/3125], train_loss:0.158743 Epoch [1/2], Iter [2193/3125], train_loss:0.165279 Epoch [1/2], Iter [2194/3125], train_loss:0.151960 Epoch [1/2], Iter [2195/3125], train_loss:0.149383 Epoch [1/2], Iter [2196/3125], train_loss:0.157849 Epoch [1/2], Iter [2197/3125], train_loss:0.168340 Epoch [1/2], Iter [2198/3125], train_loss:0.156435 Epoch [1/2], Iter [2199/3125], train_loss:0.139338 Epoch [1/2], Iter [2200/3125], train_loss:0.160937 Epoch [1/2], Iter [2201/3125], train_loss:0.155327 Epoch [1/2], Iter [2202/3125], train_loss:0.177210 Epoch [1/2], Iter [2203/3125], train_loss:0.178461 Epoch [1/2], Iter [2204/3125], train_loss:0.171101 Epoch [1/2], Iter [2205/3125], train_loss:0.184354 Epoch [1/2], Iter [2206/3125], train_loss:0.144931 Epoch [1/2], Iter [2207/3125], train_loss:0.163896 Epoch [1/2], Iter [2208/3125], train_loss:0.169595 Epoch [1/2], Iter [2209/3125], train_loss:0.157569 Epoch [1/2], Iter [2210/3125], train_loss:0.178770 Epoch [1/2], Iter [2211/3125], train_loss:0.143887 Epoch [1/2], Iter [2212/3125], train_loss:0.161863 Epoch [1/2], Iter [2213/3125], train_loss:0.141911 Epoch [1/2], Iter [2214/3125], train_loss:0.141126 Epoch [1/2], Iter [2215/3125], train_loss:0.181017 Epoch [1/2], Iter [2216/3125], train_loss:0.147786 Epoch [1/2], Iter [2217/3125], train_loss:0.141503 Epoch [1/2], Iter [2218/3125], train_loss:0.157903 Epoch [1/2], Iter [2219/3125], train_loss:0.154843 Epoch [1/2], Iter [2220/3125], train_loss:0.148844 Epoch [1/2], Iter [2221/3125], train_loss:0.161168 Epoch [1/2], Iter [2222/3125], train_loss:0.171695 Epoch [1/2], Iter [2223/3125], train_loss:0.163227 Epoch [1/2], Iter [2224/3125], train_loss:0.159965 Epoch [1/2], Iter [2225/3125], train_loss:0.162049 Epoch [1/2], Iter [2226/3125], train_loss:0.172635 Epoch [1/2], Iter [2227/3125], train_loss:0.152669 Epoch [1/2], Iter [2228/3125], train_loss:0.154830 Epoch [1/2], Iter [2229/3125], train_loss:0.163990 Epoch [1/2], Iter [2230/3125], train_loss:0.168742 Epoch [1/2], Iter [2231/3125], train_loss:0.183449 Epoch [1/2], Iter [2232/3125], train_loss:0.148132 Epoch [1/2], Iter [2233/3125], train_loss:0.175107 Epoch [1/2], Iter [2234/3125], train_loss:0.157387 Epoch [1/2], Iter [2235/3125], train_loss:0.150439 Epoch [1/2], Iter [2236/3125], train_loss:0.141313 Epoch [1/2], Iter [2237/3125], train_loss:0.150460 Epoch [1/2], Iter [2238/3125], train_loss:0.157583 Epoch [1/2], Iter [2239/3125], train_loss:0.160917 Epoch [1/2], Iter [2240/3125], train_loss:0.175020 Epoch [1/2], Iter [2241/3125], train_loss:0.172231 Epoch [1/2], Iter [2242/3125], train_loss:0.158164 Epoch [1/2], Iter [2243/3125], train_loss:0.149750 Epoch [1/2], Iter [2244/3125], train_loss:0.160434 Epoch [1/2], Iter [2245/3125], train_loss:0.163752 Epoch [1/2], Iter [2246/3125], train_loss:0.146884 Epoch [1/2], Iter [2247/3125], train_loss:0.158364 Epoch [1/2], Iter [2248/3125], train_loss:0.156011 Epoch [1/2], Iter [2249/3125], train_loss:0.173126 Epoch [1/2], Iter [2250/3125], train_loss:0.193422 Epoch [1/2], Iter [2251/3125], train_loss:0.158470 Epoch [1/2], Iter [2252/3125], train_loss:0.151245 Epoch [1/2], Iter [2253/3125], train_loss:0.158039 Epoch [1/2], Iter [2254/3125], train_loss:0.152211 Epoch [1/2], Iter [2255/3125], train_loss:0.167252 Epoch [1/2], Iter [2256/3125], train_loss:0.156284 Epoch [1/2], Iter [2257/3125], train_loss:0.157557 Epoch [1/2], Iter [2258/3125], train_loss:0.149407 Epoch [1/2], Iter [2259/3125], train_loss:0.166932 Epoch [1/2], Iter [2260/3125], train_loss:0.174253 Epoch [1/2], Iter [2261/3125], train_loss:0.171375 Epoch [1/2], Iter [2262/3125], train_loss:0.166366 Epoch [1/2], Iter [2263/3125], train_loss:0.149717 Epoch [1/2], Iter [2264/3125], train_loss:0.166810 Epoch [1/2], Iter [2265/3125], train_loss:0.162488 Epoch [1/2], Iter [2266/3125], train_loss:0.165728 Epoch [1/2], Iter [2267/3125], train_loss:0.168702 Epoch [1/2], Iter [2268/3125], train_loss:0.143795 Epoch [1/2], Iter [2269/3125], train_loss:0.125662 Epoch [1/2], Iter [2270/3125], train_loss:0.152566 Epoch [1/2], Iter [2271/3125], train_loss:0.166331 Epoch [1/2], Iter [2272/3125], train_loss:0.146904 Epoch [1/2], Iter [2273/3125], train_loss:0.176470 Epoch [1/2], Iter [2274/3125], train_loss:0.166159 Epoch [1/2], Iter [2275/3125], train_loss:0.164638 Epoch [1/2], Iter [2276/3125], train_loss:0.174697 Epoch [1/2], Iter [2277/3125], train_loss:0.172518 Epoch [1/2], Iter [2278/3125], train_loss:0.179059 Epoch [1/2], Iter [2279/3125], train_loss:0.153997 Epoch [1/2], Iter [2280/3125], train_loss:0.164288 Epoch [1/2], Iter [2281/3125], train_loss:0.156835 Epoch [1/2], Iter [2282/3125], train_loss:0.172427 Epoch [1/2], Iter [2283/3125], train_loss:0.140807 Epoch [1/2], Iter [2284/3125], train_loss:0.176298 Epoch [1/2], Iter [2285/3125], train_loss:0.167197 Epoch [1/2], Iter [2286/3125], train_loss:0.155124 Epoch [1/2], Iter [2287/3125], train_loss:0.168967 Epoch [1/2], Iter [2288/3125], train_loss:0.155021 Epoch [1/2], Iter [2289/3125], train_loss:0.200430 Epoch [1/2], Iter [2290/3125], train_loss:0.168794 Epoch [1/2], Iter [2291/3125], train_loss:0.180748 Epoch [1/2], Iter [2292/3125], train_loss:0.151308 Epoch [1/2], Iter [2293/3125], train_loss:0.168426 Epoch [1/2], Iter [2294/3125], train_loss:0.160170 Epoch [1/2], Iter [2295/3125], train_loss:0.170112 Epoch [1/2], Iter [2296/3125], train_loss:0.182302 Epoch [1/2], Iter [2297/3125], train_loss:0.165573 Epoch [1/2], Iter [2298/3125], train_loss:0.154088 Epoch [1/2], Iter [2299/3125], train_loss:0.157324 Epoch [1/2], Iter [2300/3125], train_loss:0.186557 Epoch [1/2], Iter [2301/3125], train_loss:0.159513 Epoch [1/2], Iter [2302/3125], train_loss:0.159842 Epoch [1/2], Iter [2303/3125], train_loss:0.196757 Epoch [1/2], Iter [2304/3125], train_loss:0.164728 Epoch [1/2], Iter [2305/3125], train_loss:0.159394 Epoch [1/2], Iter [2306/3125], train_loss:0.162070 Epoch [1/2], Iter [2307/3125], train_loss:0.150942 Epoch [1/2], Iter [2308/3125], train_loss:0.178885 Epoch [1/2], Iter [2309/3125], train_loss:0.167701 Epoch [1/2], Iter [2310/3125], train_loss:0.172832 Epoch [1/2], Iter [2311/3125], train_loss:0.151420 Epoch [1/2], Iter [2312/3125], train_loss:0.177722 Epoch [1/2], Iter [2313/3125], train_loss:0.152966 Epoch [1/2], Iter [2314/3125], train_loss:0.144942 Epoch [1/2], Iter [2315/3125], train_loss:0.166451 Epoch [1/2], Iter [2316/3125], train_loss:0.167570 Epoch [1/2], Iter [2317/3125], train_loss:0.173486 Epoch [1/2], Iter [2318/3125], train_loss:0.167726 Epoch [1/2], Iter [2319/3125], train_loss:0.150083 Epoch [1/2], Iter [2320/3125], train_loss:0.161335 Epoch [1/2], Iter [2321/3125], train_loss:0.163541 Epoch [1/2], Iter [2322/3125], train_loss:0.139134 Epoch [1/2], Iter [2323/3125], train_loss:0.172992 Epoch [1/2], Iter [2324/3125], train_loss:0.166975 Epoch [1/2], Iter [2325/3125], train_loss:0.161279 Epoch [1/2], Iter [2326/3125], train_loss:0.152028 Epoch [1/2], Iter [2327/3125], train_loss:0.157792 Epoch [1/2], Iter [2328/3125], train_loss:0.146232 Epoch [1/2], Iter [2329/3125], train_loss:0.169053 Epoch [1/2], Iter [2330/3125], train_loss:0.137772 Epoch [1/2], Iter [2331/3125], train_loss:0.153836 Epoch [1/2], Iter [2332/3125], train_loss:0.173346 Epoch [1/2], Iter [2333/3125], train_loss:0.170181 Epoch [1/2], Iter [2334/3125], train_loss:0.153624 Epoch [1/2], Iter [2335/3125], train_loss:0.164155 Epoch [1/2], Iter [2336/3125], train_loss:0.162456 Epoch [1/2], Iter [2337/3125], train_loss:0.165626 Epoch [1/2], Iter [2338/3125], train_loss:0.165643 Epoch [1/2], Iter [2339/3125], train_loss:0.152838 Epoch [1/2], Iter [2340/3125], train_loss:0.166339 Epoch [1/2], Iter [2341/3125], train_loss:0.162481 Epoch [1/2], Iter [2342/3125], train_loss:0.155828 Epoch [1/2], Iter [2343/3125], train_loss:0.180035 Epoch [1/2], Iter [2344/3125], train_loss:0.155793 Epoch [1/2], Iter [2345/3125], train_loss:0.141963 Epoch [1/2], Iter [2346/3125], train_loss:0.175218 Epoch [1/2], Iter [2347/3125], train_loss:0.172332 Epoch [1/2], Iter [2348/3125], train_loss:0.170206 Epoch [1/2], Iter [2349/3125], train_loss:0.158258 Epoch [1/2], Iter [2350/3125], train_loss:0.135423 Epoch [1/2], Iter [2351/3125], train_loss:0.158765 Epoch [1/2], Iter [2352/3125], train_loss:0.161856 Epoch [1/2], Iter [2353/3125], train_loss:0.165698 Epoch [1/2], Iter [2354/3125], train_loss:0.166844 Epoch [1/2], Iter [2355/3125], train_loss:0.167199 Epoch [1/2], Iter [2356/3125], train_loss:0.168643 Epoch [1/2], Iter [2357/3125], train_loss:0.145062 Epoch [1/2], Iter [2358/3125], train_loss:0.159673 Epoch [1/2], Iter [2359/3125], train_loss:0.172321 Epoch [1/2], Iter [2360/3125], train_loss:0.162261 Epoch [1/2], Iter [2361/3125], train_loss:0.160964 Epoch [1/2], Iter [2362/3125], train_loss:0.170365 Epoch [1/2], Iter [2363/3125], train_loss:0.158219 Epoch [1/2], Iter [2364/3125], train_loss:0.151682 Epoch [1/2], Iter [2365/3125], train_loss:0.173451 Epoch [1/2], Iter [2366/3125], train_loss:0.186411 Epoch [1/2], Iter [2367/3125], train_loss:0.151850 Epoch [1/2], Iter [2368/3125], train_loss:0.153410 Epoch [1/2], Iter [2369/3125], train_loss:0.150387 Epoch [1/2], Iter [2370/3125], train_loss:0.151061 Epoch [1/2], Iter [2371/3125], train_loss:0.155576 Epoch [1/2], Iter [2372/3125], train_loss:0.171615 Epoch [1/2], Iter [2373/3125], train_loss:0.152891 Epoch [1/2], Iter [2374/3125], train_loss:0.173333 Epoch [1/2], Iter [2375/3125], train_loss:0.178193 Epoch [1/2], Iter [2376/3125], train_loss:0.158169 Epoch [1/2], Iter [2377/3125], train_loss:0.171027 Epoch [1/2], Iter [2378/3125], train_loss:0.183264 Epoch [1/2], Iter [2379/3125], train_loss:0.153622 Epoch [1/2], Iter [2380/3125], train_loss:0.167066 Epoch [1/2], Iter [2381/3125], train_loss:0.151149 Epoch [1/2], Iter [2382/3125], train_loss:0.149457 Epoch [1/2], Iter [2383/3125], train_loss:0.151908 Epoch [1/2], Iter [2384/3125], train_loss:0.167173 Epoch [1/2], Iter [2385/3125], train_loss:0.146533 Epoch [1/2], Iter [2386/3125], train_loss:0.140869 Epoch [1/2], Iter [2387/3125], train_loss:0.161093 Epoch [1/2], Iter [2388/3125], train_loss:0.174047 Epoch [1/2], Iter [2389/3125], train_loss:0.169243 Epoch [1/2], Iter [2390/3125], train_loss:0.153055 Epoch [1/2], Iter [2391/3125], train_loss:0.166778 Epoch [1/2], Iter [2392/3125], train_loss:0.171477 Epoch [1/2], Iter [2393/3125], train_loss:0.148076 Epoch [1/2], Iter [2394/3125], train_loss:0.174037 Epoch [1/2], Iter [2395/3125], train_loss:0.152578 Epoch [1/2], Iter [2396/3125], train_loss:0.184196 Epoch [1/2], Iter [2397/3125], train_loss:0.167483 Epoch [1/2], Iter [2398/3125], train_loss:0.164854 Epoch [1/2], Iter [2399/3125], train_loss:0.173233 Epoch [1/2], Iter [2400/3125], train_loss:0.138813 Epoch [1/2], Iter [2401/3125], train_loss:0.152254 Epoch [1/2], Iter [2402/3125], train_loss:0.168025 Epoch [1/2], Iter [2403/3125], train_loss:0.157279 Epoch [1/2], Iter [2404/3125], train_loss:0.148963 Epoch [1/2], Iter [2405/3125], train_loss:0.144112 Epoch [1/2], Iter [2406/3125], train_loss:0.169978 Epoch [1/2], Iter [2407/3125], train_loss:0.153412 Epoch [1/2], Iter [2408/3125], train_loss:0.173826 Epoch [1/2], Iter [2409/3125], train_loss:0.169680 Epoch [1/2], Iter [2410/3125], train_loss:0.162930 Epoch [1/2], Iter [2411/3125], train_loss:0.139202 Epoch [1/2], Iter [2412/3125], train_loss:0.154762 Epoch [1/2], Iter [2413/3125], train_loss:0.154299 Epoch [1/2], Iter [2414/3125], train_loss:0.167414 Epoch [1/2], Iter [2415/3125], train_loss:0.178091 Epoch [1/2], Iter [2416/3125], train_loss:0.173954 Epoch [1/2], Iter [2417/3125], train_loss:0.166982 Epoch [1/2], Iter [2418/3125], train_loss:0.170560 Epoch [1/2], Iter [2419/3125], train_loss:0.170997 Epoch [1/2], Iter [2420/3125], train_loss:0.153861 Epoch [1/2], Iter [2421/3125], train_loss:0.177754 Epoch [1/2], Iter [2422/3125], train_loss:0.168872 Epoch [1/2], Iter [2423/3125], train_loss:0.144303 Epoch [1/2], Iter [2424/3125], train_loss:0.164720 Epoch [1/2], Iter [2425/3125], train_loss:0.186620 Epoch [1/2], Iter [2426/3125], train_loss:0.158638 Epoch [1/2], Iter [2427/3125], train_loss:0.172386 Epoch [1/2], Iter [2428/3125], train_loss:0.167100 Epoch [1/2], Iter [2429/3125], train_loss:0.167147 Epoch [1/2], Iter [2430/3125], train_loss:0.182128 Epoch [1/2], Iter [2431/3125], train_loss:0.165804 Epoch [1/2], Iter [2432/3125], train_loss:0.180088 Epoch [1/2], Iter [2433/3125], train_loss:0.165245 Epoch [1/2], Iter [2434/3125], train_loss:0.159391 Epoch [1/2], Iter [2435/3125], train_loss:0.152686 Epoch [1/2], Iter [2436/3125], train_loss:0.161874 Epoch [1/2], Iter [2437/3125], train_loss:0.165142 Epoch [1/2], Iter [2438/3125], train_loss:0.160963 Epoch [1/2], Iter [2439/3125], train_loss:0.166472 Epoch [1/2], Iter [2440/3125], train_loss:0.158173 Epoch [1/2], Iter [2441/3125], train_loss:0.173994 Epoch [1/2], Iter [2442/3125], train_loss:0.151297 Epoch [1/2], Iter [2443/3125], train_loss:0.152010 Epoch [1/2], Iter [2444/3125], train_loss:0.160982 Epoch [1/2], Iter [2445/3125], train_loss:0.182511 Epoch [1/2], Iter [2446/3125], train_loss:0.171740 Epoch [1/2], Iter [2447/3125], train_loss:0.169194 Epoch [1/2], Iter [2448/3125], train_loss:0.160217 Epoch [1/2], Iter [2449/3125], train_loss:0.170634 Epoch [1/2], Iter [2450/3125], train_loss:0.174725 Epoch [1/2], Iter [2451/3125], train_loss:0.162844 Epoch [1/2], Iter [2452/3125], train_loss:0.179684 Epoch [1/2], Iter [2453/3125], train_loss:0.165793 Epoch [1/2], Iter [2454/3125], train_loss:0.147170 Epoch [1/2], Iter [2455/3125], train_loss:0.167428 Epoch [1/2], Iter [2456/3125], train_loss:0.156832 Epoch [1/2], Iter [2457/3125], train_loss:0.163711 Epoch [1/2], Iter [2458/3125], train_loss:0.163635 Epoch [1/2], Iter [2459/3125], train_loss:0.169788 Epoch [1/2], Iter [2460/3125], train_loss:0.161291 Epoch [1/2], Iter [2461/3125], train_loss:0.176288 Epoch [1/2], Iter [2462/3125], train_loss:0.173527 Epoch [1/2], Iter [2463/3125], train_loss:0.198670 Epoch [1/2], Iter [2464/3125], train_loss:0.163765 Epoch [1/2], Iter [2465/3125], train_loss:0.155121 Epoch [1/2], Iter [2466/3125], train_loss:0.157210 Epoch [1/2], Iter [2467/3125], train_loss:0.158318 Epoch [1/2], Iter [2468/3125], train_loss:0.190069 Epoch [1/2], Iter [2469/3125], train_loss:0.157674 Epoch [1/2], Iter [2470/3125], train_loss:0.153022 Epoch [1/2], Iter [2471/3125], train_loss:0.178211 Epoch [1/2], Iter [2472/3125], train_loss:0.165668 Epoch [1/2], Iter [2473/3125], train_loss:0.170597 Epoch [1/2], Iter [2474/3125], train_loss:0.148514 Epoch [1/2], Iter [2475/3125], train_loss:0.161165 Epoch [1/2], Iter [2476/3125], train_loss:0.159940 Epoch [1/2], Iter [2477/3125], train_loss:0.163364 Epoch [1/2], Iter [2478/3125], train_loss:0.160939 Epoch [1/2], Iter [2479/3125], train_loss:0.188242 Epoch [1/2], Iter [2480/3125], train_loss:0.170161 Epoch [1/2], Iter [2481/3125], train_loss:0.166997 Epoch [1/2], Iter [2482/3125], train_loss:0.173182 Epoch [1/2], Iter [2483/3125], train_loss:0.156736 Epoch [1/2], Iter [2484/3125], train_loss:0.162785 Epoch [1/2], Iter [2485/3125], train_loss:0.159454 Epoch [1/2], Iter [2486/3125], train_loss:0.172418 Epoch [1/2], Iter [2487/3125], train_loss:0.166055 Epoch [1/2], Iter [2488/3125], train_loss:0.166522 Epoch [1/2], Iter [2489/3125], train_loss:0.157236 Epoch [1/2], Iter [2490/3125], train_loss:0.173360 Epoch [1/2], Iter [2491/3125], train_loss:0.147073 Epoch [1/2], Iter [2492/3125], train_loss:0.154806 Epoch [1/2], Iter [2493/3125], train_loss:0.159782 Epoch [1/2], Iter [2494/3125], train_loss:0.175359 Epoch [1/2], Iter [2495/3125], train_loss:0.152874 Epoch [1/2], Iter [2496/3125], train_loss:0.175603 Epoch [1/2], Iter [2497/3125], train_loss:0.151182 Epoch [1/2], Iter [2498/3125], train_loss:0.133273 Epoch [1/2], Iter [2499/3125], train_loss:0.162480 Epoch [1/2], Iter [2500/3125], train_loss:0.172038 Epoch [1/2], Iter [2501/3125], train_loss:0.163592 Epoch [1/2], Iter [2502/3125], train_loss:0.168842 Epoch [1/2], Iter [2503/3125], train_loss:0.167579 Epoch [1/2], Iter [2504/3125], train_loss:0.169892 Epoch [1/2], Iter [2505/3125], train_loss:0.184179 Epoch [1/2], Iter [2506/3125], train_loss:0.172049 Epoch [1/2], Iter [2507/3125], train_loss:0.181183 Epoch [1/2], Iter [2508/3125], train_loss:0.157703 Epoch [1/2], Iter [2509/3125], train_loss:0.156251 Epoch [1/2], Iter [2510/3125], train_loss:0.140083 Epoch [1/2], Iter [2511/3125], train_loss:0.155766 Epoch [1/2], Iter [2512/3125], train_loss:0.171320 Epoch [1/2], Iter [2513/3125], train_loss:0.165249 Epoch [1/2], Iter [2514/3125], train_loss:0.144336 Epoch [1/2], Iter [2515/3125], train_loss:0.169332 Epoch [1/2], Iter [2516/3125], train_loss:0.152470 Epoch [1/2], Iter [2517/3125], train_loss:0.161122 Epoch [1/2], Iter [2518/3125], train_loss:0.182971 Epoch [1/2], Iter [2519/3125], train_loss:0.164621 Epoch [1/2], Iter [2520/3125], train_loss:0.175796 Epoch [1/2], Iter [2521/3125], train_loss:0.176611 Epoch [1/2], Iter [2522/3125], train_loss:0.161589 Epoch [1/2], Iter [2523/3125], train_loss:0.153558 Epoch [1/2], Iter [2524/3125], train_loss:0.177934 Epoch [1/2], Iter [2525/3125], train_loss:0.140108 Epoch [1/2], Iter [2526/3125], train_loss:0.170537 Epoch [1/2], Iter [2527/3125], train_loss:0.190064 Epoch [1/2], Iter [2528/3125], train_loss:0.150987 Epoch [1/2], Iter [2529/3125], train_loss:0.153076 Epoch [1/2], Iter [2530/3125], train_loss:0.153231 Epoch [1/2], Iter [2531/3125], train_loss:0.151433 Epoch [1/2], Iter [2532/3125], train_loss:0.165380 Epoch [1/2], Iter [2533/3125], train_loss:0.154326 Epoch [1/2], Iter [2534/3125], train_loss:0.148860 Epoch [1/2], Iter [2535/3125], train_loss:0.182532 Epoch [1/2], Iter [2536/3125], train_loss:0.184858 Epoch [1/2], Iter [2537/3125], train_loss:0.144190 Epoch [1/2], Iter [2538/3125], train_loss:0.160582 Epoch [1/2], Iter [2539/3125], train_loss:0.150244 Epoch [1/2], Iter [2540/3125], train_loss:0.163084 Epoch [1/2], Iter [2541/3125], train_loss:0.173798 Epoch [1/2], Iter [2542/3125], train_loss:0.180224 Epoch [1/2], Iter [2543/3125], train_loss:0.171645 Epoch [1/2], Iter [2544/3125], train_loss:0.170542 Epoch [1/2], Iter [2545/3125], train_loss:0.150921 Epoch [1/2], Iter [2546/3125], train_loss:0.141499 Epoch [1/2], Iter [2547/3125], train_loss:0.154087 Epoch [1/2], Iter [2548/3125], train_loss:0.146057 Epoch [1/2], Iter [2549/3125], train_loss:0.179915 Epoch [1/2], Iter [2550/3125], train_loss:0.178421 Epoch [1/2], Iter [2551/3125], train_loss:0.162338 Epoch [1/2], Iter [2552/3125], train_loss:0.159943 Epoch [1/2], Iter [2553/3125], train_loss:0.166942 Epoch [1/2], Iter [2554/3125], train_loss:0.161777 Epoch [1/2], Iter [2555/3125], train_loss:0.173371 Epoch [1/2], Iter [2556/3125], train_loss:0.149645 Epoch [1/2], Iter [2557/3125], train_loss:0.150998 Epoch [1/2], Iter [2558/3125], train_loss:0.168478 Epoch [1/2], Iter [2559/3125], train_loss:0.161073 Epoch [1/2], Iter [2560/3125], train_loss:0.153746 Epoch [1/2], Iter [2561/3125], train_loss:0.156996 Epoch [1/2], Iter [2562/3125], train_loss:0.175018 Epoch [1/2], Iter [2563/3125], train_loss:0.161457 Epoch [1/2], Iter [2564/3125], train_loss:0.181512 Epoch [1/2], Iter [2565/3125], train_loss:0.159499 Epoch [1/2], Iter [2566/3125], train_loss:0.155685 Epoch [1/2], Iter [2567/3125], train_loss:0.160816 Epoch [1/2], Iter [2568/3125], train_loss:0.167257 Epoch [1/2], Iter [2569/3125], train_loss:0.168003 Epoch [1/2], Iter [2570/3125], train_loss:0.156276 Epoch [1/2], Iter [2571/3125], train_loss:0.166197 Epoch [1/2], Iter [2572/3125], train_loss:0.171228 Epoch [1/2], Iter [2573/3125], train_loss:0.169274 Epoch [1/2], Iter [2574/3125], train_loss:0.178607 Epoch [1/2], Iter [2575/3125], train_loss:0.180143 Epoch [1/2], Iter [2576/3125], train_loss:0.165496 Epoch [1/2], Iter [2577/3125], train_loss:0.164666 Epoch [1/2], Iter [2578/3125], train_loss:0.172761 Epoch [1/2], Iter [2579/3125], train_loss:0.142597 Epoch [1/2], Iter [2580/3125], train_loss:0.166856 Epoch [1/2], Iter [2581/3125], train_loss:0.180629 Epoch [1/2], Iter [2582/3125], train_loss:0.155988 Epoch [1/2], Iter [2583/3125], train_loss:0.190004 Epoch [1/2], Iter [2584/3125], train_loss:0.153131 Epoch [1/2], Iter [2585/3125], train_loss:0.149209 Epoch [1/2], Iter [2586/3125], train_loss:0.182763 Epoch [1/2], Iter [2587/3125], train_loss:0.163803 Epoch [1/2], Iter [2588/3125], train_loss:0.164377 Epoch [1/2], Iter [2589/3125], train_loss:0.165225 Epoch [1/2], Iter [2590/3125], train_loss:0.132286 Epoch [1/2], Iter [2591/3125], train_loss:0.157618 Epoch [1/2], Iter [2592/3125], train_loss:0.180062 Epoch [1/2], Iter [2593/3125], train_loss:0.149064 Epoch [1/2], Iter [2594/3125], train_loss:0.182419 Epoch [1/2], Iter [2595/3125], train_loss:0.152154 Epoch [1/2], Iter [2596/3125], train_loss:0.156817 Epoch [1/2], Iter [2597/3125], train_loss:0.158894 Epoch [1/2], Iter [2598/3125], train_loss:0.174006 Epoch [1/2], Iter [2599/3125], train_loss:0.170469 Epoch [1/2], Iter [2600/3125], train_loss:0.163272 Epoch [1/2], Iter [2601/3125], train_loss:0.165293 Epoch [1/2], Iter [2602/3125], train_loss:0.132606 Epoch [1/2], Iter [2603/3125], train_loss:0.181648 Epoch [1/2], Iter [2604/3125], train_loss:0.172091 Epoch [1/2], Iter [2605/3125], train_loss:0.145725 Epoch [1/2], Iter [2606/3125], train_loss:0.159542 Epoch [1/2], Iter [2607/3125], train_loss:0.166341 Epoch [1/2], Iter [2608/3125], train_loss:0.144378 Epoch [1/2], Iter [2609/3125], train_loss:0.174001 Epoch [1/2], Iter [2610/3125], train_loss:0.154200 Epoch [1/2], Iter [2611/3125], train_loss:0.168938 Epoch [1/2], Iter [2612/3125], train_loss:0.151330 Epoch [1/2], Iter [2613/3125], train_loss:0.158763 Epoch [1/2], Iter [2614/3125], train_loss:0.154259 Epoch [1/2], Iter [2615/3125], train_loss:0.155223 Epoch [1/2], Iter [2616/3125], train_loss:0.173738 Epoch [1/2], Iter [2617/3125], train_loss:0.164574 Epoch [1/2], Iter [2618/3125], train_loss:0.171280 Epoch [1/2], Iter [2619/3125], train_loss:0.167967 Epoch [1/2], Iter [2620/3125], train_loss:0.165825 Epoch [1/2], Iter [2621/3125], train_loss:0.163001 Epoch [1/2], Iter [2622/3125], train_loss:0.166808 Epoch [1/2], Iter [2623/3125], train_loss:0.158262 Epoch [1/2], Iter [2624/3125], train_loss:0.152927 Epoch [1/2], Iter [2625/3125], train_loss:0.151799 Epoch [1/2], Iter [2626/3125], train_loss:0.153348 Epoch [1/2], Iter [2627/3125], train_loss:0.145824 Epoch [1/2], Iter [2628/3125], train_loss:0.149315 Epoch [1/2], Iter [2629/3125], train_loss:0.183911 Epoch [1/2], Iter [2630/3125], train_loss:0.153068 Epoch [1/2], Iter [2631/3125], train_loss:0.163764 Epoch [1/2], Iter [2632/3125], train_loss:0.161556 Epoch [1/2], Iter [2633/3125], train_loss:0.177212 Epoch [1/2], Iter [2634/3125], train_loss:0.149619 Epoch [1/2], Iter [2635/3125], train_loss:0.160023 Epoch [1/2], Iter [2636/3125], train_loss:0.169547 Epoch [1/2], Iter [2637/3125], train_loss:0.147591 Epoch [1/2], Iter [2638/3125], train_loss:0.156738 Epoch [1/2], Iter [2639/3125], train_loss:0.148298 Epoch [1/2], Iter [2640/3125], train_loss:0.161786 Epoch [1/2], Iter [2641/3125], train_loss:0.162544 Epoch [1/2], Iter [2642/3125], train_loss:0.168581 Epoch [1/2], Iter [2643/3125], train_loss:0.167225 Epoch [1/2], Iter [2644/3125], train_loss:0.160467 Epoch [1/2], Iter [2645/3125], train_loss:0.166200 Epoch [1/2], Iter [2646/3125], train_loss:0.167931 Epoch [1/2], Iter [2647/3125], train_loss:0.157258 Epoch [1/2], Iter [2648/3125], train_loss:0.142979 Epoch [1/2], Iter [2649/3125], train_loss:0.169719 Epoch [1/2], Iter [2650/3125], train_loss:0.179859 Epoch [1/2], Iter [2651/3125], train_loss:0.154542 Epoch [1/2], Iter [2652/3125], train_loss:0.157200 Epoch [1/2], Iter [2653/3125], train_loss:0.178602 Epoch [1/2], Iter [2654/3125], train_loss:0.145348 Epoch [1/2], Iter [2655/3125], train_loss:0.156349 Epoch [1/2], Iter [2656/3125], train_loss:0.148944 Epoch [1/2], Iter [2657/3125], train_loss:0.157309 Epoch [1/2], Iter [2658/3125], train_loss:0.162670 Epoch [1/2], Iter [2659/3125], train_loss:0.150020 Epoch [1/2], Iter [2660/3125], train_loss:0.157252 Epoch [1/2], Iter [2661/3125], train_loss:0.166470 Epoch [1/2], Iter [2662/3125], train_loss:0.178597 Epoch [1/2], Iter [2663/3125], train_loss:0.145679 Epoch [1/2], Iter [2664/3125], train_loss:0.142497 Epoch [1/2], Iter [2665/3125], train_loss:0.153192 Epoch [1/2], Iter [2666/3125], train_loss:0.155716 Epoch [1/2], Iter [2667/3125], train_loss:0.174556 Epoch [1/2], Iter [2668/3125], train_loss:0.152721 Epoch [1/2], Iter [2669/3125], train_loss:0.169619 Epoch [1/2], Iter [2670/3125], train_loss:0.167028 Epoch [1/2], Iter [2671/3125], train_loss:0.154183 Epoch [1/2], Iter [2672/3125], train_loss:0.175002 Epoch [1/2], Iter [2673/3125], train_loss:0.139364 Epoch [1/2], Iter [2674/3125], train_loss:0.162451 Epoch [1/2], Iter [2675/3125], train_loss:0.157143 Epoch [1/2], Iter [2676/3125], train_loss:0.166282 Epoch [1/2], Iter [2677/3125], train_loss:0.150420 Epoch [1/2], Iter [2678/3125], train_loss:0.172134 Epoch [1/2], Iter [2679/3125], train_loss:0.170172 Epoch [1/2], Iter [2680/3125], train_loss:0.188591 Epoch [1/2], Iter [2681/3125], train_loss:0.133006 Epoch [1/2], Iter [2682/3125], train_loss:0.154428 Epoch [1/2], Iter [2683/3125], train_loss:0.146256 Epoch [1/2], Iter [2684/3125], train_loss:0.140180 Epoch [1/2], Iter [2685/3125], train_loss:0.150448 Epoch [1/2], Iter [2686/3125], train_loss:0.166966 Epoch [1/2], Iter [2687/3125], train_loss:0.163846 Epoch [1/2], Iter [2688/3125], train_loss:0.151998 Epoch [1/2], Iter [2689/3125], train_loss:0.177917 Epoch [1/2], Iter [2690/3125], train_loss:0.164405 Epoch [1/2], Iter [2691/3125], train_loss:0.149646 Epoch [1/2], Iter [2692/3125], train_loss:0.155895 Epoch [1/2], Iter [2693/3125], train_loss:0.133467 Epoch [1/2], Iter [2694/3125], train_loss:0.181978 Epoch [1/2], Iter [2695/3125], train_loss:0.178019 Epoch [1/2], Iter [2696/3125], train_loss:0.164970 Epoch [1/2], Iter [2697/3125], train_loss:0.153656 Epoch [1/2], Iter [2698/3125], train_loss:0.158283 Epoch [1/2], Iter [2699/3125], train_loss:0.166151 Epoch [1/2], Iter [2700/3125], train_loss:0.152899 Epoch [1/2], Iter [2701/3125], train_loss:0.150675 Epoch [1/2], Iter [2702/3125], train_loss:0.161370 Epoch [1/2], Iter [2703/3125], train_loss:0.162690 Epoch [1/2], Iter [2704/3125], train_loss:0.146854 Epoch [1/2], Iter [2705/3125], train_loss:0.168728 Epoch [1/2], Iter [2706/3125], train_loss:0.156361 Epoch [1/2], Iter [2707/3125], train_loss:0.162295 Epoch [1/2], Iter [2708/3125], train_loss:0.154698 Epoch [1/2], Iter [2709/3125], train_loss:0.162639 Epoch [1/2], Iter [2710/3125], train_loss:0.170419 Epoch [1/2], Iter [2711/3125], train_loss:0.182608 Epoch [1/2], Iter [2712/3125], train_loss:0.174881 Epoch [1/2], Iter [2713/3125], train_loss:0.163568 Epoch [1/2], Iter [2714/3125], train_loss:0.172464 Epoch [1/2], Iter [2715/3125], train_loss:0.152963 Epoch [1/2], Iter [2716/3125], train_loss:0.174935 Epoch [1/2], Iter [2717/3125], train_loss:0.163978 Epoch [1/2], Iter [2718/3125], train_loss:0.149811 Epoch [1/2], Iter [2719/3125], train_loss:0.168551 Epoch [1/2], Iter [2720/3125], train_loss:0.187687 Epoch [1/2], Iter [2721/3125], train_loss:0.170561 Epoch [1/2], Iter [2722/3125], train_loss:0.157643 Epoch [1/2], Iter [2723/3125], train_loss:0.183448 Epoch [1/2], Iter [2724/3125], train_loss:0.156940 Epoch [1/2], Iter [2725/3125], train_loss:0.176922 Epoch [1/2], Iter [2726/3125], train_loss:0.170941 Epoch [1/2], Iter [2727/3125], train_loss:0.161215 Epoch [1/2], Iter [2728/3125], train_loss:0.157638 Epoch [1/2], Iter [2729/3125], train_loss:0.146765 Epoch [1/2], Iter [2730/3125], train_loss:0.186415 Epoch [1/2], Iter [2731/3125], train_loss:0.179016 Epoch [1/2], Iter [2732/3125], train_loss:0.146862 Epoch [1/2], Iter [2733/3125], train_loss:0.160904 Epoch [1/2], Iter [2734/3125], train_loss:0.184066 Epoch [1/2], Iter [2735/3125], train_loss:0.170018 Epoch [1/2], Iter [2736/3125], train_loss:0.151466 Epoch [1/2], Iter [2737/3125], train_loss:0.155503 Epoch [1/2], Iter [2738/3125], train_loss:0.178504 Epoch [1/2], Iter [2739/3125], train_loss:0.182733 Epoch [1/2], Iter [2740/3125], train_loss:0.178885 Epoch [1/2], Iter [2741/3125], train_loss:0.158115 Epoch [1/2], Iter [2742/3125], train_loss:0.166074 Epoch [1/2], Iter [2743/3125], train_loss:0.175153 Epoch [1/2], Iter [2744/3125], train_loss:0.173695 Epoch [1/2], Iter [2745/3125], train_loss:0.140103 Epoch [1/2], Iter [2746/3125], train_loss:0.164165 Epoch [1/2], Iter [2747/3125], train_loss:0.195799 Epoch [1/2], Iter [2748/3125], train_loss:0.165051 Epoch [1/2], Iter [2749/3125], train_loss:0.168219 Epoch [1/2], Iter [2750/3125], train_loss:0.145761 Epoch [1/2], Iter [2751/3125], train_loss:0.184619 Epoch [1/2], Iter [2752/3125], train_loss:0.183593 Epoch [1/2], Iter [2753/3125], train_loss:0.161479 Epoch [1/2], Iter [2754/3125], train_loss:0.165525 Epoch [1/2], Iter [2755/3125], train_loss:0.152368 Epoch [1/2], Iter [2756/3125], train_loss:0.156252 Epoch [1/2], Iter [2757/3125], train_loss:0.160543 Epoch [1/2], Iter [2758/3125], train_loss:0.169057 Epoch [1/2], Iter [2759/3125], train_loss:0.185539 Epoch [1/2], Iter [2760/3125], train_loss:0.150664 Epoch [1/2], Iter [2761/3125], train_loss:0.168148 Epoch [1/2], Iter [2762/3125], train_loss:0.150886 Epoch [1/2], Iter [2763/3125], train_loss:0.153608 Epoch [1/2], Iter [2764/3125], train_loss:0.173608 Epoch [1/2], Iter [2765/3125], train_loss:0.156316 Epoch [1/2], Iter [2766/3125], train_loss:0.155580 Epoch [1/2], Iter [2767/3125], train_loss:0.170365 Epoch [1/2], Iter [2768/3125], train_loss:0.160952 Epoch [1/2], Iter [2769/3125], train_loss:0.178418 Epoch [1/2], Iter [2770/3125], train_loss:0.161754 Epoch [1/2], Iter [2771/3125], train_loss:0.175010 Epoch [1/2], Iter [2772/3125], train_loss:0.177170 Epoch [1/2], Iter [2773/3125], train_loss:0.156224 Epoch [1/2], Iter [2774/3125], train_loss:0.171853 Epoch [1/2], Iter [2775/3125], train_loss:0.175113 Epoch [1/2], Iter [2776/3125], train_loss:0.153226 Epoch [1/2], Iter [2777/3125], train_loss:0.167736 Epoch [1/2], Iter [2778/3125], train_loss:0.160811 Epoch [1/2], Iter [2779/3125], train_loss:0.174287 Epoch [1/2], Iter [2780/3125], train_loss:0.158126 Epoch [1/2], Iter [2781/3125], train_loss:0.170792 Epoch [1/2], Iter [2782/3125], train_loss:0.165518 Epoch [1/2], Iter [2783/3125], train_loss:0.162349 Epoch [1/2], Iter [2784/3125], train_loss:0.145470 Epoch [1/2], Iter [2785/3125], train_loss:0.159157 Epoch [1/2], Iter [2786/3125], train_loss:0.147954 Epoch [1/2], Iter [2787/3125], train_loss:0.170489 Epoch [1/2], Iter [2788/3125], train_loss:0.165043 Epoch [1/2], Iter [2789/3125], train_loss:0.163622 Epoch [1/2], Iter [2790/3125], train_loss:0.154899 Epoch [1/2], Iter [2791/3125], train_loss:0.160961 Epoch [1/2], Iter [2792/3125], train_loss:0.165133 Epoch [1/2], Iter [2793/3125], train_loss:0.183820 Epoch [1/2], Iter [2794/3125], train_loss:0.170000 Epoch [1/2], Iter [2795/3125], train_loss:0.164589 Epoch [1/2], Iter [2796/3125], train_loss:0.180219 Epoch [1/2], Iter [2797/3125], train_loss:0.144782 Epoch [1/2], Iter [2798/3125], train_loss:0.175786 Epoch [1/2], Iter [2799/3125], train_loss:0.128005 Epoch [1/2], Iter [2800/3125], train_loss:0.156003 Epoch [1/2], Iter [2801/3125], train_loss:0.151638 Epoch [1/2], Iter [2802/3125], train_loss:0.162846 Epoch [1/2], Iter [2803/3125], train_loss:0.162985 Epoch [1/2], Iter [2804/3125], train_loss:0.160361 Epoch [1/2], Iter [2805/3125], train_loss:0.151148 Epoch [1/2], Iter [2806/3125], train_loss:0.164542 Epoch [1/2], Iter [2807/3125], train_loss:0.142881 Epoch [1/2], Iter [2808/3125], train_loss:0.156098 Epoch [1/2], Iter [2809/3125], train_loss:0.133754 Epoch [1/2], Iter [2810/3125], train_loss:0.170719 Epoch [1/2], Iter [2811/3125], train_loss:0.149624 Epoch [1/2], Iter [2812/3125], train_loss:0.175666 Epoch [1/2], Iter [2813/3125], train_loss:0.178650 Epoch [1/2], Iter [2814/3125], train_loss:0.160231 Epoch [1/2], Iter [2815/3125], train_loss:0.181755 Epoch [1/2], Iter [2816/3125], train_loss:0.177022 Epoch [1/2], Iter [2817/3125], train_loss:0.143955 Epoch [1/2], Iter [2818/3125], train_loss:0.182202 Epoch [1/2], Iter [2819/3125], train_loss:0.156804 Epoch [1/2], Iter [2820/3125], train_loss:0.158852 Epoch [1/2], Iter [2821/3125], train_loss:0.159252 Epoch [1/2], Iter [2822/3125], train_loss:0.159138 Epoch [1/2], Iter [2823/3125], train_loss:0.158014 Epoch [1/2], Iter [2824/3125], train_loss:0.173861 Epoch [1/2], Iter [2825/3125], train_loss:0.163103 Epoch [1/2], Iter [2826/3125], train_loss:0.169961 Epoch [1/2], Iter [2827/3125], train_loss:0.160450 Epoch [1/2], Iter [2828/3125], train_loss:0.168754 Epoch [1/2], Iter [2829/3125], train_loss:0.145734 Epoch [1/2], Iter [2830/3125], train_loss:0.171105 Epoch [1/2], Iter [2831/3125], train_loss:0.149704 Epoch [1/2], Iter [2832/3125], train_loss:0.157235 Epoch [1/2], Iter [2833/3125], train_loss:0.168647 Epoch [1/2], Iter [2834/3125], train_loss:0.170278 Epoch [1/2], Iter [2835/3125], train_loss:0.164118 Epoch [1/2], Iter [2836/3125], train_loss:0.160487 Epoch [1/2], Iter [2837/3125], train_loss:0.170349 Epoch [1/2], Iter [2838/3125], train_loss:0.153062 Epoch [1/2], Iter [2839/3125], train_loss:0.179919 Epoch [1/2], Iter [2840/3125], train_loss:0.165033 Epoch [1/2], Iter [2841/3125], train_loss:0.159011 Epoch [1/2], Iter [2842/3125], train_loss:0.141699 Epoch [1/2], Iter [2843/3125], train_loss:0.155806 Epoch [1/2], Iter [2844/3125], train_loss:0.180037 Epoch [1/2], Iter [2845/3125], train_loss:0.172654 Epoch [1/2], Iter [2846/3125], train_loss:0.162126 Epoch [1/2], Iter [2847/3125], train_loss:0.174910 Epoch [1/2], Iter [2848/3125], train_loss:0.190180 Epoch [1/2], Iter [2849/3125], train_loss:0.167382 Epoch [1/2], Iter [2850/3125], train_loss:0.140893 Epoch [1/2], Iter [2851/3125], train_loss:0.169695 Epoch [1/2], Iter [2852/3125], train_loss:0.149698 Epoch [1/2], Iter [2853/3125], train_loss:0.150947 Epoch [1/2], Iter [2854/3125], train_loss:0.160250 Epoch [1/2], Iter [2855/3125], train_loss:0.167571 Epoch [1/2], Iter [2856/3125], train_loss:0.158384 Epoch [1/2], Iter [2857/3125], train_loss:0.137086 Epoch [1/2], Iter [2858/3125], train_loss:0.177784 Epoch [1/2], Iter [2859/3125], train_loss:0.172647 Epoch [1/2], Iter [2860/3125], train_loss:0.169255 Epoch [1/2], Iter [2861/3125], train_loss:0.169094 Epoch [1/2], Iter [2862/3125], train_loss:0.159690 Epoch [1/2], Iter [2863/3125], train_loss:0.162201 Epoch [1/2], Iter [2864/3125], train_loss:0.167594 Epoch [1/2], Iter [2865/3125], train_loss:0.167401 Epoch [1/2], Iter [2866/3125], train_loss:0.164989 Epoch [1/2], Iter [2867/3125], train_loss:0.138895 Epoch [1/2], Iter [2868/3125], train_loss:0.155665 Epoch [1/2], Iter [2869/3125], train_loss:0.178687 Epoch [1/2], Iter [2870/3125], train_loss:0.142473 Epoch [1/2], Iter [2871/3125], train_loss:0.167332 Epoch [1/2], Iter [2872/3125], train_loss:0.179365 Epoch [1/2], Iter [2873/3125], train_loss:0.167223 Epoch [1/2], Iter [2874/3125], train_loss:0.178953 Epoch [1/2], Iter [2875/3125], train_loss:0.157346 Epoch [1/2], Iter [2876/3125], train_loss:0.182048 Epoch [1/2], Iter [2877/3125], train_loss:0.172396 Epoch [1/2], Iter [2878/3125], train_loss:0.175423 Epoch [1/2], Iter [2879/3125], train_loss:0.161872 Epoch [1/2], Iter [2880/3125], train_loss:0.169045 Epoch [1/2], Iter [2881/3125], train_loss:0.169418 Epoch [1/2], Iter [2882/3125], train_loss:0.160182 Epoch [1/2], Iter [2883/3125], train_loss:0.186741 Epoch [1/2], Iter [2884/3125], train_loss:0.157193 Epoch [1/2], Iter [2885/3125], train_loss:0.138638 Epoch [1/2], Iter [2886/3125], train_loss:0.150510 Epoch [1/2], Iter [2887/3125], train_loss:0.176207 Epoch [1/2], Iter [2888/3125], train_loss:0.155249 Epoch [1/2], Iter [2889/3125], train_loss:0.159106 Epoch [1/2], Iter [2890/3125], train_loss:0.162412 Epoch [1/2], Iter [2891/3125], train_loss:0.152091 Epoch [1/2], Iter [2892/3125], train_loss:0.176883 Epoch [1/2], Iter [2893/3125], train_loss:0.146511 Epoch [1/2], Iter [2894/3125], train_loss:0.163757 Epoch [1/2], Iter [2895/3125], train_loss:0.160787 Epoch [1/2], Iter [2896/3125], train_loss:0.160858 Epoch [1/2], Iter [2897/3125], train_loss:0.155350 Epoch [1/2], Iter [2898/3125], train_loss:0.169348 Epoch [1/2], Iter [2899/3125], train_loss:0.144282 Epoch [1/2], Iter [2900/3125], train_loss:0.167706 Epoch [1/2], Iter [2901/3125], train_loss:0.182318 Epoch [1/2], Iter [2902/3125], train_loss:0.171248 Epoch [1/2], Iter [2903/3125], train_loss:0.165353 Epoch [1/2], Iter [2904/3125], train_loss:0.151637 Epoch [1/2], Iter [2905/3125], train_loss:0.161721 Epoch [1/2], Iter [2906/3125], train_loss:0.153006 Epoch [1/2], Iter [2907/3125], train_loss:0.161867 Epoch [1/2], Iter [2908/3125], train_loss:0.156607 Epoch [1/2], Iter [2909/3125], train_loss:0.178779 Epoch [1/2], Iter [2910/3125], train_loss:0.192463 Epoch [1/2], Iter [2911/3125], train_loss:0.148583 Epoch [1/2], Iter [2912/3125], train_loss:0.170696 Epoch [1/2], Iter [2913/3125], train_loss:0.168631 Epoch [1/2], Iter [2914/3125], train_loss:0.168608 Epoch [1/2], Iter [2915/3125], train_loss:0.166084 Epoch [1/2], Iter [2916/3125], train_loss:0.164468 Epoch [1/2], Iter [2917/3125], train_loss:0.154483 Epoch [1/2], Iter [2918/3125], train_loss:0.166607 Epoch [1/2], Iter [2919/3125], train_loss:0.175541 Epoch [1/2], Iter [2920/3125], train_loss:0.146106 Epoch [1/2], Iter [2921/3125], train_loss:0.186289 Epoch [1/2], Iter [2922/3125], train_loss:0.148206 Epoch [1/2], Iter [2923/3125], train_loss:0.180759 Epoch [1/2], Iter [2924/3125], train_loss:0.148458 Epoch [1/2], Iter [2925/3125], train_loss:0.153044 Epoch [1/2], Iter [2926/3125], train_loss:0.173843 Epoch [1/2], Iter [2927/3125], train_loss:0.173281 Epoch [1/2], Iter [2928/3125], train_loss:0.173701 Epoch [1/2], Iter [2929/3125], train_loss:0.165718 Epoch [1/2], Iter [2930/3125], train_loss:0.173092 Epoch [1/2], Iter [2931/3125], train_loss:0.171520 Epoch [1/2], Iter [2932/3125], train_loss:0.148433 Epoch [1/2], Iter [2933/3125], train_loss:0.149291 Epoch [1/2], Iter [2934/3125], train_loss:0.173039 Epoch [1/2], Iter [2935/3125], train_loss:0.167303 Epoch [1/2], Iter [2936/3125], train_loss:0.148045 Epoch [1/2], Iter [2937/3125], train_loss:0.160600 Epoch [1/2], Iter [2938/3125], train_loss:0.175791 Epoch [1/2], Iter [2939/3125], train_loss:0.170290 Epoch [1/2], Iter [2940/3125], train_loss:0.168750 Epoch [1/2], Iter [2941/3125], train_loss:0.174851 Epoch [1/2], Iter [2942/3125], train_loss:0.167067 Epoch [1/2], Iter [2943/3125], train_loss:0.147908 Epoch [1/2], Iter [2944/3125], train_loss:0.161702 Epoch [1/2], Iter [2945/3125], train_loss:0.166226 Epoch [1/2], Iter [2946/3125], train_loss:0.152965 Epoch [1/2], Iter [2947/3125], train_loss:0.151126 Epoch [1/2], Iter [2948/3125], train_loss:0.159228 Epoch [1/2], Iter [2949/3125], train_loss:0.147525 Epoch [1/2], Iter [2950/3125], train_loss:0.186010 Epoch [1/2], Iter [2951/3125], train_loss:0.144456 Epoch [1/2], Iter [2952/3125], train_loss:0.144571 Epoch [1/2], Iter [2953/3125], train_loss:0.149504 Epoch [1/2], Iter [2954/3125], train_loss:0.155754 Epoch [1/2], Iter [2955/3125], train_loss:0.157044 Epoch [1/2], Iter [2956/3125], train_loss:0.164638 Epoch [1/2], Iter [2957/3125], train_loss:0.161717 Epoch [1/2], Iter [2958/3125], train_loss:0.150048 Epoch [1/2], Iter [2959/3125], train_loss:0.161040 Epoch [1/2], Iter [2960/3125], train_loss:0.147002 Epoch [1/2], Iter [2961/3125], train_loss:0.168605 Epoch [1/2], Iter [2962/3125], train_loss:0.160989 Epoch [1/2], Iter [2963/3125], train_loss:0.179867 Epoch [1/2], Iter [2964/3125], train_loss:0.173219 Epoch [1/2], Iter [2965/3125], train_loss:0.166897 Epoch [1/2], Iter [2966/3125], train_loss:0.160661 Epoch [1/2], Iter [2967/3125], train_loss:0.161262 Epoch [1/2], Iter [2968/3125], train_loss:0.164723 Epoch [1/2], Iter [2969/3125], train_loss:0.142853 Epoch [1/2], Iter [2970/3125], train_loss:0.171715 Epoch [1/2], Iter [2971/3125], train_loss:0.158447 Epoch [1/2], Iter [2972/3125], train_loss:0.164181 Epoch [1/2], Iter [2973/3125], train_loss:0.177048 Epoch [1/2], Iter [2974/3125], train_loss:0.167190 Epoch [1/2], Iter [2975/3125], train_loss:0.158204 Epoch [1/2], Iter [2976/3125], train_loss:0.151028 Epoch [1/2], Iter [2977/3125], train_loss:0.162853 Epoch [1/2], Iter [2978/3125], train_loss:0.165735 Epoch [1/2], Iter [2979/3125], train_loss:0.173848 Epoch [1/2], Iter [2980/3125], train_loss:0.149452 Epoch [1/2], Iter [2981/3125], train_loss:0.152468 Epoch [1/2], Iter [2982/3125], train_loss:0.168138 Epoch [1/2], Iter [2983/3125], train_loss:0.163172 Epoch [1/2], Iter [2984/3125], train_loss:0.162576 Epoch [1/2], Iter [2985/3125], train_loss:0.188783 Epoch [1/2], Iter [2986/3125], train_loss:0.161452 Epoch [1/2], Iter [2987/3125], train_loss:0.136657 Epoch [1/2], Iter [2988/3125], train_loss:0.145196 Epoch [1/2], Iter [2989/3125], train_loss:0.183863 Epoch [1/2], Iter [2990/3125], train_loss:0.170865 Epoch [1/2], Iter [2991/3125], train_loss:0.155084 Epoch [1/2], Iter [2992/3125], train_loss:0.175260 Epoch [1/2], Iter [2993/3125], train_loss:0.177893 Epoch [1/2], Iter [2994/3125], train_loss:0.171074 Epoch [1/2], Iter [2995/3125], train_loss:0.166262 Epoch [1/2], Iter [2996/3125], train_loss:0.168631 Epoch [1/2], Iter [2997/3125], train_loss:0.142343 Epoch [1/2], Iter [2998/3125], train_loss:0.176656 Epoch [1/2], Iter [2999/3125], train_loss:0.181024 Epoch [1/2], Iter [3000/3125], train_loss:0.164563 Epoch [1/2], Iter [3001/3125], train_loss:0.181617 Epoch [1/2], Iter [3002/3125], train_loss:0.172865 Epoch [1/2], Iter [3003/3125], train_loss:0.179876 Epoch [1/2], Iter [3004/3125], train_loss:0.165719 Epoch [1/2], Iter [3005/3125], train_loss:0.177486 Epoch [1/2], Iter [3006/3125], train_loss:0.176950 Epoch [1/2], Iter [3007/3125], train_loss:0.178203 Epoch [1/2], Iter [3008/3125], train_loss:0.178196 Epoch [1/2], Iter [3009/3125], train_loss:0.171647 Epoch [1/2], Iter [3010/3125], train_loss:0.173414 Epoch [1/2], Iter [3011/3125], train_loss:0.164811 Epoch [1/2], Iter [3012/3125], train_loss:0.147020 Epoch [1/2], Iter [3013/3125], train_loss:0.166289 Epoch [1/2], Iter [3014/3125], train_loss:0.161090 Epoch [1/2], Iter [3015/3125], train_loss:0.162289 Epoch [1/2], Iter [3016/3125], train_loss:0.130393 Epoch [1/2], Iter [3017/3125], train_loss:0.132035 Epoch [1/2], Iter [3018/3125], train_loss:0.174404 Epoch [1/2], Iter [3019/3125], train_loss:0.157980 Epoch [1/2], Iter [3020/3125], train_loss:0.158861 Epoch [1/2], Iter [3021/3125], train_loss:0.182830 Epoch [1/2], Iter [3022/3125], train_loss:0.158150 Epoch [1/2], Iter [3023/3125], train_loss:0.156165 Epoch [1/2], Iter [3024/3125], train_loss:0.145425 Epoch [1/2], Iter [3025/3125], train_loss:0.176111 Epoch [1/2], Iter [3026/3125], train_loss:0.186718 Epoch [1/2], Iter [3027/3125], train_loss:0.150117 Epoch [1/2], Iter [3028/3125], train_loss:0.173456 Epoch [1/2], Iter [3029/3125], train_loss:0.156002 Epoch [1/2], Iter [3030/3125], train_loss:0.175069 Epoch [1/2], Iter [3031/3125], train_loss:0.150203 Epoch [1/2], Iter [3032/3125], train_loss:0.170119 Epoch [1/2], Iter [3033/3125], train_loss:0.161877 Epoch [1/2], Iter [3034/3125], train_loss:0.154505 Epoch [1/2], Iter [3035/3125], train_loss:0.170968 Epoch [1/2], Iter [3036/3125], train_loss:0.143941 Epoch [1/2], Iter [3037/3125], train_loss:0.171731 Epoch [1/2], Iter [3038/3125], train_loss:0.150052 Epoch [1/2], Iter [3039/3125], train_loss:0.155370 Epoch [1/2], Iter [3040/3125], train_loss:0.154070 Epoch [1/2], Iter [3041/3125], train_loss:0.169434 Epoch [1/2], Iter [3042/3125], train_loss:0.153931 Epoch [1/2], Iter [3043/3125], train_loss:0.167334 Epoch [1/2], Iter [3044/3125], train_loss:0.160416 Epoch [1/2], Iter [3045/3125], train_loss:0.161101 Epoch [1/2], Iter [3046/3125], train_loss:0.153652 Epoch [1/2], Iter [3047/3125], train_loss:0.166452 Epoch [1/2], Iter [3048/3125], train_loss:0.148719 Epoch [1/2], Iter [3049/3125], train_loss:0.153907 Epoch [1/2], Iter [3050/3125], train_loss:0.165748 Epoch [1/2], Iter [3051/3125], train_loss:0.177738 Epoch [1/2], Iter [3052/3125], train_loss:0.162658 Epoch [1/2], Iter [3053/3125], train_loss:0.157725 Epoch [1/2], Iter [3054/3125], train_loss:0.168763 Epoch [1/2], Iter [3055/3125], train_loss:0.169479 Epoch [1/2], Iter [3056/3125], train_loss:0.160464 Epoch [1/2], Iter [3057/3125], train_loss:0.165181 Epoch [1/2], Iter [3058/3125], train_loss:0.158833 Epoch [1/2], Iter [3059/3125], train_loss:0.174259 Epoch [1/2], Iter [3060/3125], train_loss:0.197122 Epoch [1/2], Iter [3061/3125], train_loss:0.157540 Epoch [1/2], Iter [3062/3125], train_loss:0.153574 Epoch [1/2], Iter [3063/3125], train_loss:0.158650 Epoch [1/2], Iter [3064/3125], train_loss:0.159368 Epoch [1/2], Iter [3065/3125], train_loss:0.126841 Epoch [1/2], Iter [3066/3125], train_loss:0.190723 Epoch [1/2], Iter [3067/3125], train_loss:0.161133 Epoch [1/2], Iter [3068/3125], train_loss:0.147794 Epoch [1/2], Iter [3069/3125], train_loss:0.154277 Epoch [1/2], Iter [3070/3125], train_loss:0.160044 Epoch [1/2], Iter [3071/3125], train_loss:0.157531 Epoch [1/2], Iter [3072/3125], train_loss:0.168389 Epoch [1/2], Iter [3073/3125], train_loss:0.172469 Epoch [1/2], Iter [3074/3125], train_loss:0.155994 Epoch [1/2], Iter [3075/3125], train_loss:0.147720 Epoch [1/2], Iter [3076/3125], train_loss:0.137509 Epoch [1/2], Iter [3077/3125], train_loss:0.181711 Epoch [1/2], Iter [3078/3125], train_loss:0.177348 Epoch [1/2], Iter [3079/3125], train_loss:0.148808 Epoch [1/2], Iter [3080/3125], train_loss:0.175595 Epoch [1/2], Iter [3081/3125], train_loss:0.165768 Epoch [1/2], Iter [3082/3125], train_loss:0.142488 Epoch [1/2], Iter [3083/3125], train_loss:0.147224 Epoch [1/2], Iter [3084/3125], train_loss:0.168570 Epoch [1/2], Iter [3085/3125], train_loss:0.155916 Epoch [1/2], Iter [3086/3125], train_loss:0.169448 Epoch [1/2], Iter [3087/3125], train_loss:0.148978 Epoch [1/2], Iter [3088/3125], train_loss:0.158718 Epoch [1/2], Iter [3089/3125], train_loss:0.139569 Epoch [1/2], Iter [3090/3125], train_loss:0.179602 Epoch [1/2], Iter [3091/3125], train_loss:0.172581 Epoch [1/2], Iter [3092/3125], train_loss:0.172989 Epoch [1/2], Iter [3093/3125], train_loss:0.174835 Epoch [1/2], Iter [3094/3125], train_loss:0.162024 Epoch [1/2], Iter [3095/3125], train_loss:0.149372 Epoch [1/2], Iter [3096/3125], train_loss:0.182143 Epoch [1/2], Iter [3097/3125], train_loss:0.173537 Epoch [1/2], Iter [3098/3125], train_loss:0.180467 Epoch [1/2], Iter [3099/3125], train_loss:0.138658 Epoch [1/2], Iter [3100/3125], train_loss:0.167943 Epoch [1/2], Iter [3101/3125], train_loss:0.179498 Epoch [1/2], Iter [3102/3125], train_loss:0.168319 Epoch [1/2], Iter [3103/3125], train_loss:0.159227 Epoch [1/2], Iter [3104/3125], train_loss:0.143851 Epoch [1/2], Iter [3105/3125], train_loss:0.162043 Epoch [1/2], Iter [3106/3125], train_loss:0.173713 Epoch [1/2], Iter [3107/3125], train_loss:0.160019 Epoch [1/2], Iter [3108/3125], train_loss:0.187196 Epoch [1/2], Iter [3109/3125], train_loss:0.178457 Epoch [1/2], Iter [3110/3125], train_loss:0.166758 Epoch [1/2], Iter [3111/3125], train_loss:0.162495 Epoch [1/2], Iter [3112/3125], train_loss:0.144868 Epoch [1/2], Iter [3113/3125], train_loss:0.170601 Epoch [1/2], Iter [3114/3125], train_loss:0.152794 Epoch [1/2], Iter [3115/3125], train_loss:0.166172 Epoch [1/2], Iter [3116/3125], train_loss:0.150413 Epoch [1/2], Iter [3117/3125], train_loss:0.146555 Epoch [1/2], Iter [3118/3125], train_loss:0.158817 Epoch [1/2], Iter [3119/3125], train_loss:0.179008 Epoch [1/2], Iter [3120/3125], train_loss:0.183372 Epoch [1/2], Iter [3121/3125], train_loss:0.165688 Epoch [1/2], Iter [3122/3125], train_loss:0.151766 Epoch [1/2], Iter [3123/3125], train_loss:0.147575 Epoch [1/2], Iter [3124/3125], train_loss:0.140461 Epoch [1/2], Iter [3125/3125], train_loss:0.166029 Epoch [1/2], train_loss:0.1625, train_acc:9.7080%, test_loss:0.1695, test_acc:10.6200% Epoch [2/2], Iter [1/3125], train_loss:0.146202 Epoch [2/2], Iter [2/3125], train_loss:0.173672 Epoch [2/2], Iter [3/3125], train_loss:0.165151 Epoch [2/2], Iter [4/3125], train_loss:0.158770 Epoch [2/2], Iter [5/3125], train_loss:0.175999 Epoch [2/2], Iter [6/3125], train_loss:0.163998 Epoch [2/2], Iter [7/3125], train_loss:0.165410 Epoch [2/2], Iter [8/3125], train_loss:0.161637 Epoch [2/2], Iter [9/3125], train_loss:0.148239 Epoch [2/2], Iter [10/3125], train_loss:0.162426 Epoch [2/2], Iter [11/3125], train_loss:0.168900 Epoch [2/2], Iter [12/3125], train_loss:0.149848 Epoch [2/2], Iter [13/3125], train_loss:0.147608 Epoch [2/2], Iter [14/3125], train_loss:0.160673 Epoch [2/2], Iter [15/3125], train_loss:0.172021 Epoch [2/2], Iter [16/3125], train_loss:0.162101 Epoch [2/2], Iter [17/3125], train_loss:0.150480 Epoch [2/2], Iter [18/3125], train_loss:0.154776 Epoch [2/2], Iter [19/3125], train_loss:0.163754 Epoch [2/2], Iter [20/3125], train_loss:0.177525 Epoch [2/2], Iter [21/3125], train_loss:0.168097 Epoch [2/2], Iter [22/3125], train_loss:0.156192 Epoch [2/2], Iter [23/3125], train_loss:0.166126 Epoch [2/2], Iter [24/3125], train_loss:0.147863 Epoch [2/2], Iter [25/3125], train_loss:0.176202 Epoch [2/2], Iter [26/3125], train_loss:0.159570 Epoch [2/2], Iter [27/3125], train_loss:0.168702 Epoch [2/2], Iter [28/3125], train_loss:0.151392 Epoch [2/2], Iter [29/3125], train_loss:0.162362 Epoch [2/2], Iter [30/3125], train_loss:0.147167 Epoch [2/2], Iter [31/3125], train_loss:0.155992 Epoch [2/2], Iter [32/3125], train_loss:0.143932 Epoch [2/2], Iter [33/3125], train_loss:0.167568 Epoch [2/2], Iter [34/3125], train_loss:0.156876 Epoch [2/2], Iter [35/3125], train_loss:0.149783 Epoch [2/2], Iter [36/3125], train_loss:0.184439 Epoch [2/2], Iter [37/3125], train_loss:0.162946 Epoch [2/2], Iter [38/3125], train_loss:0.148541 Epoch [2/2], Iter [39/3125], train_loss:0.165627 Epoch [2/2], Iter [40/3125], train_loss:0.169342 Epoch [2/2], Iter [41/3125], train_loss:0.165507 Epoch [2/2], Iter [42/3125], train_loss:0.166825 Epoch [2/2], Iter [43/3125], train_loss:0.180178 Epoch [2/2], Iter [44/3125], train_loss:0.174066 Epoch [2/2], Iter [45/3125], train_loss:0.175319 Epoch [2/2], Iter [46/3125], train_loss:0.159672 Epoch [2/2], Iter [47/3125], train_loss:0.155855 Epoch [2/2], Iter [48/3125], train_loss:0.166862 Epoch [2/2], Iter [49/3125], train_loss:0.157197 Epoch [2/2], Iter [50/3125], train_loss:0.154708 Epoch [2/2], Iter [51/3125], train_loss:0.169141 Epoch [2/2], Iter [52/3125], train_loss:0.189146 Epoch [2/2], Iter [53/3125], train_loss:0.147940 Epoch [2/2], Iter [54/3125], train_loss:0.173229 Epoch [2/2], Iter [55/3125], train_loss:0.147851 Epoch [2/2], Iter [56/3125], train_loss:0.166568 Epoch [2/2], Iter [57/3125], train_loss:0.157517 Epoch [2/2], Iter [58/3125], train_loss:0.157088 Epoch [2/2], Iter [59/3125], train_loss:0.170904 Epoch [2/2], Iter [60/3125], train_loss:0.130077 Epoch [2/2], Iter [61/3125], train_loss:0.162462 Epoch [2/2], Iter [62/3125], train_loss:0.167202 Epoch [2/2], Iter [63/3125], train_loss:0.144449 Epoch [2/2], Iter [64/3125], train_loss:0.147543 Epoch [2/2], Iter [65/3125], train_loss:0.178345 Epoch [2/2], Iter [66/3125], train_loss:0.171756 Epoch [2/2], Iter [67/3125], train_loss:0.182125 Epoch [2/2], Iter [68/3125], train_loss:0.163568 Epoch [2/2], Iter [69/3125], train_loss:0.168720 Epoch [2/2], Iter [70/3125], train_loss:0.166233 Epoch [2/2], Iter [71/3125], train_loss:0.165497 Epoch [2/2], Iter [72/3125], train_loss:0.158568 Epoch [2/2], Iter [73/3125], train_loss:0.158017 Epoch [2/2], Iter [74/3125], train_loss:0.146704 Epoch [2/2], Iter [75/3125], train_loss:0.168960 Epoch [2/2], Iter [76/3125], train_loss:0.176339 Epoch [2/2], Iter [77/3125], train_loss:0.157601 Epoch [2/2], Iter [78/3125], train_loss:0.150234 Epoch [2/2], Iter [79/3125], train_loss:0.171131 Epoch [2/2], Iter [80/3125], train_loss:0.168470 Epoch [2/2], Iter [81/3125], train_loss:0.165504 Epoch [2/2], Iter [82/3125], train_loss:0.182929 Epoch [2/2], Iter [83/3125], train_loss:0.149121 Epoch [2/2], Iter [84/3125], train_loss:0.170251 Epoch [2/2], Iter [85/3125], train_loss:0.176452 Epoch [2/2], Iter [86/3125], train_loss:0.163143 Epoch [2/2], Iter [87/3125], train_loss:0.149888 Epoch [2/2], Iter [88/3125], train_loss:0.158223 Epoch [2/2], Iter [89/3125], train_loss:0.165219 Epoch [2/2], Iter [90/3125], train_loss:0.175566 Epoch [2/2], Iter [91/3125], train_loss:0.172680 Epoch [2/2], Iter [92/3125], train_loss:0.157610 Epoch [2/2], Iter [93/3125], train_loss:0.149683 Epoch [2/2], Iter [94/3125], train_loss:0.150491 Epoch [2/2], Iter [95/3125], train_loss:0.143823 Epoch [2/2], Iter [96/3125], train_loss:0.147380 Epoch [2/2], Iter [97/3125], train_loss:0.162991 Epoch [2/2], Iter [98/3125], train_loss:0.142088 Epoch [2/2], Iter [99/3125], train_loss:0.165098 Epoch [2/2], Iter [100/3125], train_loss:0.142414 Epoch [2/2], Iter [101/3125], train_loss:0.171030 Epoch [2/2], Iter [102/3125], train_loss:0.164070 Epoch [2/2], Iter [103/3125], train_loss:0.155812 Epoch [2/2], Iter [104/3125], train_loss:0.166394 Epoch [2/2], Iter [105/3125], train_loss:0.162388 Epoch [2/2], Iter [106/3125], train_loss:0.156700 Epoch [2/2], Iter [107/3125], train_loss:0.153787 Epoch [2/2], Iter [108/3125], train_loss:0.146724 Epoch [2/2], Iter [109/3125], train_loss:0.146993 Epoch [2/2], Iter [110/3125], train_loss:0.161078 Epoch [2/2], Iter [111/3125], train_loss:0.141862 Epoch [2/2], Iter [112/3125], train_loss:0.164413 Epoch [2/2], Iter [113/3125], train_loss:0.172509 Epoch [2/2], Iter [114/3125], train_loss:0.133704 Epoch [2/2], Iter [115/3125], train_loss:0.156570 Epoch [2/2], Iter [116/3125], train_loss:0.149274 Epoch [2/2], Iter [117/3125], train_loss:0.172428 Epoch [2/2], Iter [118/3125], train_loss:0.158011 Epoch [2/2], Iter [119/3125], train_loss:0.180269 Epoch [2/2], Iter [120/3125], train_loss:0.133947 Epoch [2/2], Iter [121/3125], train_loss:0.160919 Epoch [2/2], Iter [122/3125], train_loss:0.160910 Epoch [2/2], Iter [123/3125], train_loss:0.156073 Epoch [2/2], Iter [124/3125], train_loss:0.170647 Epoch [2/2], Iter [125/3125], train_loss:0.168909 Epoch [2/2], Iter [126/3125], train_loss:0.163942 Epoch [2/2], Iter [127/3125], train_loss:0.185147 Epoch [2/2], Iter [128/3125], train_loss:0.147694 Epoch [2/2], Iter [129/3125], train_loss:0.154867 Epoch [2/2], Iter [130/3125], train_loss:0.156400 Epoch [2/2], Iter [131/3125], train_loss:0.159859 Epoch [2/2], Iter [132/3125], train_loss:0.163676 Epoch [2/2], Iter [133/3125], train_loss:0.164885 Epoch [2/2], Iter [134/3125], train_loss:0.157290 Epoch [2/2], Iter [135/3125], train_loss:0.153076 Epoch [2/2], Iter [136/3125], train_loss:0.170953 Epoch [2/2], Iter [137/3125], train_loss:0.161285 Epoch [2/2], Iter [138/3125], train_loss:0.176708 Epoch [2/2], Iter [139/3125], train_loss:0.164216 Epoch [2/2], Iter [140/3125], train_loss:0.157998 Epoch [2/2], Iter [141/3125], train_loss:0.161874 Epoch [2/2], Iter [142/3125], train_loss:0.165788 Epoch [2/2], Iter [143/3125], train_loss:0.147918 Epoch [2/2], Iter [144/3125], train_loss:0.168310 Epoch [2/2], Iter [145/3125], train_loss:0.157749 Epoch [2/2], Iter [146/3125], train_loss:0.170075 Epoch [2/2], Iter [147/3125], train_loss:0.162752 Epoch [2/2], Iter [148/3125], train_loss:0.170934 Epoch [2/2], Iter [149/3125], train_loss:0.184253 Epoch [2/2], Iter [150/3125], train_loss:0.178670 Epoch [2/2], Iter [151/3125], train_loss:0.168679 Epoch [2/2], Iter [152/3125], train_loss:0.175516 Epoch [2/2], Iter [153/3125], train_loss:0.155538 Epoch [2/2], Iter [154/3125], train_loss:0.161324 Epoch [2/2], Iter [155/3125], train_loss:0.156795 Epoch [2/2], Iter [156/3125], train_loss:0.154852 Epoch [2/2], Iter [157/3125], train_loss:0.156921 Epoch [2/2], Iter [158/3125], train_loss:0.163482 Epoch [2/2], Iter [159/3125], train_loss:0.173362 Epoch [2/2], Iter [160/3125], train_loss:0.167319 Epoch [2/2], Iter [161/3125], train_loss:0.173615 Epoch [2/2], Iter [162/3125], train_loss:0.160354 Epoch [2/2], Iter [163/3125], train_loss:0.167696 Epoch [2/2], Iter [164/3125], train_loss:0.161250 Epoch [2/2], Iter [165/3125], train_loss:0.160384 Epoch [2/2], Iter [166/3125], train_loss:0.164563 Epoch [2/2], Iter [167/3125], train_loss:0.161137 Epoch [2/2], Iter [168/3125], train_loss:0.169574 Epoch [2/2], Iter [169/3125], train_loss:0.175531 Epoch [2/2], Iter [170/3125], train_loss:0.169590 Epoch [2/2], Iter [171/3125], train_loss:0.157394 Epoch [2/2], Iter [172/3125], train_loss:0.156446 Epoch [2/2], Iter [173/3125], train_loss:0.176099 Epoch [2/2], Iter [174/3125], train_loss:0.169188 Epoch [2/2], Iter [175/3125], train_loss:0.181089 Epoch [2/2], Iter [176/3125], train_loss:0.157710 Epoch [2/2], Iter [177/3125], train_loss:0.154907 Epoch [2/2], Iter [178/3125], train_loss:0.139118 Epoch [2/2], Iter [179/3125], train_loss:0.148639 Epoch [2/2], Iter [180/3125], train_loss:0.149552 Epoch [2/2], Iter [181/3125], train_loss:0.181338 Epoch [2/2], Iter [182/3125], train_loss:0.162902 Epoch [2/2], Iter [183/3125], train_loss:0.173415 Epoch [2/2], Iter [184/3125], train_loss:0.163751 Epoch [2/2], Iter [185/3125], train_loss:0.148597 Epoch [2/2], Iter [186/3125], train_loss:0.174917 Epoch [2/2], Iter [187/3125], train_loss:0.182508 Epoch [2/2], Iter [188/3125], train_loss:0.152830 Epoch [2/2], Iter [189/3125], train_loss:0.153870 Epoch [2/2], Iter [190/3125], train_loss:0.163149 Epoch [2/2], Iter [191/3125], train_loss:0.148616 Epoch [2/2], Iter [192/3125], train_loss:0.148913 Epoch [2/2], Iter [193/3125], train_loss:0.187292 Epoch [2/2], Iter [194/3125], train_loss:0.163163 Epoch [2/2], Iter [195/3125], train_loss:0.157831 Epoch [2/2], Iter [196/3125], train_loss:0.183797 Epoch [2/2], Iter [197/3125], train_loss:0.171313 Epoch [2/2], Iter [198/3125], train_loss:0.157854 Epoch [2/2], Iter [199/3125], train_loss:0.162880 Epoch [2/2], Iter [200/3125], train_loss:0.176139 Epoch [2/2], Iter [201/3125], train_loss:0.170941 Epoch [2/2], Iter [202/3125], train_loss:0.177162 Epoch [2/2], Iter [203/3125], train_loss:0.150648 Epoch [2/2], Iter [204/3125], train_loss:0.171486 Epoch [2/2], Iter [205/3125], train_loss:0.150289 Epoch [2/2], Iter [206/3125], train_loss:0.168230 Epoch [2/2], Iter [207/3125], train_loss:0.163843 Epoch [2/2], Iter [208/3125], train_loss:0.162255 Epoch [2/2], Iter [209/3125], train_loss:0.162224 Epoch [2/2], Iter [210/3125], train_loss:0.147608 Epoch [2/2], Iter [211/3125], train_loss:0.153870 Epoch [2/2], Iter [212/3125], train_loss:0.141862 Epoch [2/2], Iter [213/3125], train_loss:0.148429 Epoch [2/2], Iter [214/3125], train_loss:0.156956 Epoch [2/2], Iter [215/3125], train_loss:0.160064 Epoch [2/2], Iter [216/3125], train_loss:0.155396 Epoch [2/2], Iter [217/3125], train_loss:0.158974 Epoch [2/2], Iter [218/3125], train_loss:0.164166 Epoch [2/2], Iter [219/3125], train_loss:0.150157 Epoch [2/2], Iter [220/3125], train_loss:0.159278 Epoch [2/2], Iter [221/3125], train_loss:0.145524 Epoch [2/2], Iter [222/3125], train_loss:0.153799 Epoch [2/2], Iter [223/3125], train_loss:0.156198 Epoch [2/2], Iter [224/3125], train_loss:0.161148 Epoch [2/2], Iter [225/3125], train_loss:0.142585 Epoch [2/2], Iter [226/3125], train_loss:0.146489 Epoch [2/2], Iter [227/3125], train_loss:0.172975 Epoch [2/2], Iter [228/3125], train_loss:0.194386 Epoch [2/2], Iter [229/3125], train_loss:0.172534 Epoch [2/2], Iter [230/3125], train_loss:0.147119 Epoch [2/2], Iter [231/3125], train_loss:0.153974 Epoch [2/2], Iter [232/3125], train_loss:0.156483 Epoch [2/2], Iter [233/3125], train_loss:0.153530 Epoch [2/2], Iter [234/3125], train_loss:0.164038 Epoch [2/2], Iter [235/3125], train_loss:0.173976 Epoch [2/2], Iter [236/3125], train_loss:0.174818 Epoch [2/2], Iter [237/3125], train_loss:0.156790 Epoch [2/2], Iter [238/3125], train_loss:0.164833 Epoch [2/2], Iter [239/3125], train_loss:0.142041 Epoch [2/2], Iter [240/3125], train_loss:0.151814 Epoch [2/2], Iter [241/3125], train_loss:0.178047 Epoch [2/2], Iter [242/3125], train_loss:0.177161 Epoch [2/2], Iter [243/3125], train_loss:0.183264 Epoch [2/2], Iter [244/3125], train_loss:0.149528 Epoch [2/2], Iter [245/3125], train_loss:0.148756 Epoch [2/2], Iter [246/3125], train_loss:0.190471 Epoch [2/2], Iter [247/3125], train_loss:0.176104 Epoch [2/2], Iter [248/3125], train_loss:0.156350 Epoch [2/2], Iter [249/3125], train_loss:0.142632 Epoch [2/2], Iter [250/3125], train_loss:0.174584 Epoch [2/2], Iter [251/3125], train_loss:0.154501 Epoch [2/2], Iter [252/3125], train_loss:0.163151 Epoch [2/2], Iter [253/3125], train_loss:0.166830 Epoch [2/2], Iter [254/3125], train_loss:0.151940 Epoch [2/2], Iter [255/3125], train_loss:0.172570 Epoch [2/2], Iter [256/3125], train_loss:0.149426 Epoch [2/2], Iter [257/3125], train_loss:0.167744 Epoch [2/2], Iter [258/3125], train_loss:0.167243 Epoch [2/2], Iter [259/3125], train_loss:0.150426 Epoch [2/2], Iter [260/3125], train_loss:0.143742 Epoch [2/2], Iter [261/3125], train_loss:0.154619 Epoch [2/2], Iter [262/3125], train_loss:0.177493 Epoch [2/2], Iter [263/3125], train_loss:0.149127 Epoch [2/2], Iter [264/3125], train_loss:0.145748 Epoch [2/2], Iter [265/3125], train_loss:0.159908 Epoch [2/2], Iter [266/3125], train_loss:0.173237 Epoch [2/2], Iter [267/3125], train_loss:0.148302 Epoch [2/2], Iter [268/3125], train_loss:0.153039 Epoch [2/2], Iter [269/3125], train_loss:0.153943 Epoch [2/2], Iter [270/3125], train_loss:0.159962 Epoch [2/2], Iter [271/3125], train_loss:0.168486 Epoch [2/2], Iter [272/3125], train_loss:0.174194 Epoch [2/2], Iter [273/3125], train_loss:0.177417 Epoch [2/2], Iter [274/3125], train_loss:0.169610 Epoch [2/2], Iter [275/3125], train_loss:0.153916 Epoch [2/2], Iter [276/3125], train_loss:0.162009 Epoch [2/2], Iter [277/3125], train_loss:0.173930 Epoch [2/2], Iter [278/3125], train_loss:0.154844 Epoch [2/2], Iter [279/3125], train_loss:0.144510 Epoch [2/2], Iter [280/3125], train_loss:0.174670 Epoch [2/2], Iter [281/3125], train_loss:0.147663 Epoch [2/2], Iter [282/3125], train_loss:0.161231 Epoch [2/2], Iter [283/3125], train_loss:0.164567 Epoch [2/2], Iter [284/3125], train_loss:0.148298 Epoch [2/2], Iter [285/3125], train_loss:0.174240 Epoch [2/2], Iter [286/3125], train_loss:0.151915 Epoch [2/2], Iter [287/3125], train_loss:0.164254 Epoch [2/2], Iter [288/3125], train_loss:0.174495 Epoch [2/2], Iter [289/3125], train_loss:0.142919 Epoch [2/2], Iter [290/3125], train_loss:0.164818 Epoch [2/2], Iter [291/3125], train_loss:0.148046 Epoch [2/2], Iter [292/3125], train_loss:0.133363 Epoch [2/2], Iter [293/3125], train_loss:0.160022 Epoch [2/2], Iter [294/3125], train_loss:0.155773 Epoch [2/2], Iter [295/3125], train_loss:0.176180 Epoch [2/2], Iter [296/3125], train_loss:0.164451 Epoch [2/2], Iter [297/3125], train_loss:0.167795 Epoch [2/2], Iter [298/3125], train_loss:0.165779 Epoch [2/2], Iter [299/3125], train_loss:0.176171 Epoch [2/2], Iter [300/3125], train_loss:0.171345 Epoch [2/2], Iter [301/3125], train_loss:0.184329 Epoch [2/2], Iter [302/3125], train_loss:0.172903 Epoch [2/2], Iter [303/3125], train_loss:0.178375 Epoch [2/2], Iter [304/3125], train_loss:0.155158 Epoch [2/2], Iter [305/3125], train_loss:0.171172 Epoch [2/2], Iter [306/3125], train_loss:0.154146 Epoch [2/2], Iter [307/3125], train_loss:0.162431 Epoch [2/2], Iter [308/3125], train_loss:0.163887 Epoch [2/2], Iter [309/3125], train_loss:0.174687 Epoch [2/2], Iter [310/3125], train_loss:0.165460 Epoch [2/2], Iter [311/3125], train_loss:0.181555 Epoch [2/2], Iter [312/3125], train_loss:0.150162 Epoch [2/2], Iter [313/3125], train_loss:0.153412 Epoch [2/2], Iter [314/3125], train_loss:0.149629 Epoch [2/2], Iter [315/3125], train_loss:0.158892 Epoch [2/2], Iter [316/3125], train_loss:0.156130 Epoch [2/2], Iter [317/3125], train_loss:0.187546 Epoch [2/2], Iter [318/3125], train_loss:0.153912 Epoch [2/2], Iter [319/3125], train_loss:0.151770 Epoch [2/2], Iter [320/3125], train_loss:0.176303 Epoch [2/2], Iter [321/3125], train_loss:0.167846 Epoch [2/2], Iter [322/3125], train_loss:0.150853 Epoch [2/2], Iter [323/3125], train_loss:0.174334 Epoch [2/2], Iter [324/3125], train_loss:0.152363 Epoch [2/2], Iter [325/3125], train_loss:0.182887 Epoch [2/2], Iter [326/3125], train_loss:0.149897 Epoch [2/2], Iter [327/3125], train_loss:0.170501 Epoch [2/2], Iter [328/3125], train_loss:0.186834 Epoch [2/2], Iter [329/3125], train_loss:0.163417 Epoch [2/2], Iter [330/3125], train_loss:0.182607 Epoch [2/2], Iter [331/3125], train_loss:0.167527 Epoch [2/2], Iter [332/3125], train_loss:0.171005 Epoch [2/2], Iter [333/3125], train_loss:0.162520 Epoch [2/2], Iter [334/3125], train_loss:0.160567 Epoch [2/2], Iter [335/3125], train_loss:0.165600 Epoch [2/2], Iter [336/3125], train_loss:0.155164 Epoch [2/2], Iter [337/3125], train_loss:0.175315 Epoch [2/2], Iter [338/3125], train_loss:0.171219 Epoch [2/2], Iter [339/3125], train_loss:0.162644 Epoch [2/2], Iter [340/3125], train_loss:0.159048 Epoch [2/2], Iter [341/3125], train_loss:0.162782 Epoch [2/2], Iter [342/3125], train_loss:0.165438 Epoch [2/2], Iter [343/3125], train_loss:0.153910 Epoch [2/2], Iter [344/3125], train_loss:0.174372 Epoch [2/2], Iter [345/3125], train_loss:0.177340 Epoch [2/2], Iter [346/3125], train_loss:0.177186 Epoch [2/2], Iter [347/3125], train_loss:0.163347 Epoch [2/2], Iter [348/3125], train_loss:0.164975 Epoch [2/2], Iter [349/3125], train_loss:0.202241 Epoch [2/2], Iter [350/3125], train_loss:0.176461 Epoch [2/2], Iter [351/3125], train_loss:0.155909 Epoch [2/2], Iter [352/3125], train_loss:0.161746 Epoch [2/2], Iter [353/3125], train_loss:0.161433 Epoch [2/2], Iter [354/3125], train_loss:0.161199 Epoch [2/2], Iter [355/3125], train_loss:0.176037 Epoch [2/2], Iter [356/3125], train_loss:0.165718 Epoch [2/2], Iter [357/3125], train_loss:0.144140 Epoch [2/2], Iter [358/3125], train_loss:0.142182 Epoch [2/2], Iter [359/3125], train_loss:0.151589 Epoch [2/2], Iter [360/3125], train_loss:0.170065 Epoch [2/2], Iter [361/3125], train_loss:0.155288 Epoch [2/2], Iter [362/3125], train_loss:0.153488 Epoch [2/2], Iter [363/3125], train_loss:0.156576 Epoch [2/2], Iter [364/3125], train_loss:0.161076 Epoch [2/2], Iter [365/3125], train_loss:0.161203 Epoch [2/2], Iter [366/3125], train_loss:0.164802 Epoch [2/2], Iter [367/3125], train_loss:0.166324 Epoch [2/2], Iter [368/3125], train_loss:0.178081 Epoch [2/2], Iter [369/3125], train_loss:0.144357 Epoch [2/2], Iter [370/3125], train_loss:0.174453 Epoch [2/2], Iter [371/3125], train_loss:0.168766 Epoch [2/2], Iter [372/3125], train_loss:0.147773 Epoch [2/2], Iter [373/3125], train_loss:0.143407 Epoch [2/2], Iter [374/3125], train_loss:0.154440 Epoch [2/2], Iter [375/3125], train_loss:0.144308 Epoch [2/2], Iter [376/3125], train_loss:0.146517 Epoch [2/2], Iter [377/3125], train_loss:0.168994 Epoch [2/2], Iter [378/3125], train_loss:0.155020 Epoch [2/2], Iter [379/3125], train_loss:0.136322 Epoch [2/2], Iter [380/3125], train_loss:0.165164 Epoch [2/2], Iter [381/3125], train_loss:0.165966 Epoch [2/2], Iter [382/3125], train_loss:0.149831 Epoch [2/2], Iter [383/3125], train_loss:0.153939 Epoch [2/2], Iter [384/3125], train_loss:0.150713 Epoch [2/2], Iter [385/3125], train_loss:0.149525 Epoch [2/2], Iter [386/3125], train_loss:0.186537 Epoch [2/2], Iter [387/3125], train_loss:0.155550 Epoch [2/2], Iter [388/3125], train_loss:0.130376 Epoch [2/2], Iter [389/3125], train_loss:0.168143 Epoch [2/2], Iter [390/3125], train_loss:0.153200 Epoch [2/2], Iter [391/3125], train_loss:0.156268 Epoch [2/2], Iter [392/3125], train_loss:0.138514 Epoch [2/2], Iter [393/3125], train_loss:0.186347 Epoch [2/2], Iter [394/3125], train_loss:0.167708 Epoch [2/2], Iter [395/3125], train_loss:0.156236 Epoch [2/2], Iter [396/3125], train_loss:0.161214 Epoch [2/2], Iter [397/3125], train_loss:0.164827 Epoch [2/2], Iter [398/3125], train_loss:0.162833 Epoch [2/2], Iter [399/3125], train_loss:0.148330 Epoch [2/2], Iter [400/3125], train_loss:0.151244 Epoch [2/2], Iter [401/3125], train_loss:0.175030 Epoch [2/2], Iter [402/3125], train_loss:0.167966 Epoch [2/2], Iter [403/3125], train_loss:0.174009 Epoch [2/2], Iter [404/3125], train_loss:0.133342 Epoch [2/2], Iter [405/3125], train_loss:0.160444 Epoch [2/2], Iter [406/3125], train_loss:0.173592 Epoch [2/2], Iter [407/3125], train_loss:0.176786 Epoch [2/2], Iter [408/3125], train_loss:0.161026 Epoch [2/2], Iter [409/3125], train_loss:0.173543 Epoch [2/2], Iter [410/3125], train_loss:0.147663 Epoch [2/2], Iter [411/3125], train_loss:0.170319 Epoch [2/2], Iter [412/3125], train_loss:0.185939 Epoch [2/2], Iter [413/3125], train_loss:0.165246 Epoch [2/2], Iter [414/3125], train_loss:0.185627 Epoch [2/2], Iter [415/3125], train_loss:0.131443 Epoch [2/2], Iter [416/3125], train_loss:0.150096 Epoch [2/2], Iter [417/3125], train_loss:0.143120 Epoch [2/2], Iter [418/3125], train_loss:0.193534 Epoch [2/2], Iter [419/3125], train_loss:0.152602 Epoch [2/2], Iter [420/3125], train_loss:0.170830 Epoch [2/2], Iter [421/3125], train_loss:0.174716 Epoch [2/2], Iter [422/3125], train_loss:0.174463 Epoch [2/2], Iter [423/3125], train_loss:0.169858 Epoch [2/2], Iter [424/3125], train_loss:0.142388 Epoch [2/2], Iter [425/3125], train_loss:0.172615 Epoch [2/2], Iter [426/3125], train_loss:0.177286 Epoch [2/2], Iter [427/3125], train_loss:0.167227 Epoch [2/2], Iter [428/3125], train_loss:0.157363 Epoch [2/2], Iter [429/3125], train_loss:0.166516 Epoch [2/2], Iter [430/3125], train_loss:0.172313 Epoch [2/2], Iter [431/3125], train_loss:0.161539 Epoch [2/2], Iter [432/3125], train_loss:0.163159 Epoch [2/2], Iter [433/3125], train_loss:0.144575 Epoch [2/2], Iter [434/3125], train_loss:0.168186 Epoch [2/2], Iter [435/3125], train_loss:0.157193 Epoch [2/2], Iter [436/3125], train_loss:0.161615 Epoch [2/2], Iter [437/3125], train_loss:0.169740 Epoch [2/2], Iter [438/3125], train_loss:0.165369 Epoch [2/2], Iter [439/3125], train_loss:0.171216 Epoch [2/2], Iter [440/3125], train_loss:0.162590 Epoch [2/2], Iter [441/3125], train_loss:0.185242 Epoch [2/2], Iter [442/3125], train_loss:0.161350 Epoch [2/2], Iter [443/3125], train_loss:0.160137 Epoch [2/2], Iter [444/3125], train_loss:0.151255 Epoch [2/2], Iter [445/3125], train_loss:0.174243 Epoch [2/2], Iter [446/3125], train_loss:0.163636 Epoch [2/2], Iter [447/3125], train_loss:0.155706 Epoch [2/2], Iter [448/3125], train_loss:0.165992 Epoch [2/2], Iter [449/3125], train_loss:0.157281 Epoch [2/2], Iter [450/3125], train_loss:0.180386 Epoch [2/2], Iter [451/3125], train_loss:0.180637 Epoch [2/2], Iter [452/3125], train_loss:0.159181 Epoch [2/2], Iter [453/3125], train_loss:0.167303 Epoch [2/2], Iter [454/3125], train_loss:0.161755 Epoch [2/2], Iter [455/3125], train_loss:0.154677 Epoch [2/2], Iter [456/3125], train_loss:0.167636 Epoch [2/2], Iter [457/3125], train_loss:0.180807 Epoch [2/2], Iter [458/3125], train_loss:0.139945 Epoch [2/2], Iter [459/3125], train_loss:0.165975 Epoch [2/2], Iter [460/3125], train_loss:0.153326 Epoch [2/2], Iter [461/3125], train_loss:0.187807 Epoch [2/2], Iter [462/3125], train_loss:0.166080 Epoch [2/2], Iter [463/3125], train_loss:0.164084 Epoch [2/2], Iter [464/3125], train_loss:0.178732 Epoch [2/2], Iter [465/3125], train_loss:0.139112 Epoch [2/2], Iter [466/3125], train_loss:0.154262 Epoch [2/2], Iter [467/3125], train_loss:0.156984 Epoch [2/2], Iter [468/3125], train_loss:0.153696 Epoch [2/2], Iter [469/3125], train_loss:0.167890 Epoch [2/2], Iter [470/3125], train_loss:0.146530 Epoch [2/2], Iter [471/3125], train_loss:0.173568 Epoch [2/2], Iter [472/3125], train_loss:0.172920 Epoch [2/2], Iter [473/3125], train_loss:0.172191 Epoch [2/2], Iter [474/3125], train_loss:0.177066 Epoch [2/2], Iter [475/3125], train_loss:0.166096 Epoch [2/2], Iter [476/3125], train_loss:0.145177 Epoch [2/2], Iter [477/3125], train_loss:0.154965 Epoch [2/2], Iter [478/3125], train_loss:0.154901 Epoch [2/2], Iter [479/3125], train_loss:0.161373 Epoch [2/2], Iter [480/3125], train_loss:0.164218 Epoch [2/2], Iter [481/3125], train_loss:0.163394 Epoch [2/2], Iter [482/3125], train_loss:0.179960 Epoch [2/2], Iter [483/3125], train_loss:0.152761 Epoch [2/2], Iter [484/3125], train_loss:0.148085 Epoch [2/2], Iter [485/3125], train_loss:0.158129 Epoch [2/2], Iter [486/3125], train_loss:0.164468 Epoch [2/2], Iter [487/3125], train_loss:0.140875 Epoch [2/2], Iter [488/3125], train_loss:0.153898 Epoch [2/2], Iter [489/3125], train_loss:0.183179 Epoch [2/2], Iter [490/3125], train_loss:0.159658 Epoch [2/2], Iter [491/3125], train_loss:0.149543 Epoch [2/2], Iter [492/3125], train_loss:0.162857 Epoch [2/2], Iter [493/3125], train_loss:0.146819 Epoch [2/2], Iter [494/3125], train_loss:0.162568 Epoch [2/2], Iter [495/3125], train_loss:0.186698 Epoch [2/2], Iter [496/3125], train_loss:0.152870 Epoch [2/2], Iter [497/3125], train_loss:0.160796 Epoch [2/2], Iter [498/3125], train_loss:0.150789 Epoch [2/2], Iter [499/3125], train_loss:0.143901 Epoch [2/2], Iter [500/3125], train_loss:0.146307 Epoch [2/2], Iter [501/3125], train_loss:0.156505 Epoch [2/2], Iter [502/3125], train_loss:0.170537 Epoch [2/2], Iter [503/3125], train_loss:0.165219 Epoch [2/2], Iter [504/3125], train_loss:0.131376 Epoch [2/2], Iter [505/3125], train_loss:0.150592 Epoch [2/2], Iter [506/3125], train_loss:0.154510 Epoch [2/2], Iter [507/3125], train_loss:0.185317 Epoch [2/2], Iter [508/3125], train_loss:0.155880 Epoch [2/2], Iter [509/3125], train_loss:0.166343 Epoch [2/2], Iter [510/3125], train_loss:0.170775 Epoch [2/2], Iter [511/3125], train_loss:0.158124 Epoch [2/2], Iter [512/3125], train_loss:0.162436 Epoch [2/2], Iter [513/3125], train_loss:0.171975 Epoch [2/2], Iter [514/3125], train_loss:0.158008 Epoch [2/2], Iter [515/3125], train_loss:0.180108 Epoch [2/2], Iter [516/3125], train_loss:0.166079 Epoch [2/2], Iter [517/3125], train_loss:0.187777 Epoch [2/2], Iter [518/3125], train_loss:0.179959 Epoch [2/2], Iter [519/3125], train_loss:0.174720 Epoch [2/2], Iter [520/3125], train_loss:0.159333 Epoch [2/2], Iter [521/3125], train_loss:0.170574 Epoch [2/2], Iter [522/3125], train_loss:0.162373 Epoch [2/2], Iter [523/3125], train_loss:0.165549 Epoch [2/2], Iter [524/3125], train_loss:0.171584 Epoch [2/2], Iter [525/3125], train_loss:0.174756 Epoch [2/2], Iter [526/3125], train_loss:0.161434 Epoch [2/2], Iter [527/3125], train_loss:0.168083 Epoch [2/2], Iter [528/3125], train_loss:0.167138 Epoch [2/2], Iter [529/3125], train_loss:0.140973 Epoch [2/2], Iter [530/3125], train_loss:0.159618 Epoch [2/2], Iter [531/3125], train_loss:0.176200 Epoch [2/2], Iter [532/3125], train_loss:0.162572 Epoch [2/2], Iter [533/3125], train_loss:0.168972 Epoch [2/2], Iter [534/3125], train_loss:0.173325 Epoch [2/2], Iter [535/3125], train_loss:0.163866 Epoch [2/2], Iter [536/3125], train_loss:0.163720 Epoch [2/2], Iter [537/3125], train_loss:0.168137 Epoch [2/2], Iter [538/3125], train_loss:0.175345 Epoch [2/2], Iter [539/3125], train_loss:0.158390 Epoch [2/2], Iter [540/3125], train_loss:0.159162 Epoch [2/2], Iter [541/3125], train_loss:0.144704 Epoch [2/2], Iter [542/3125], train_loss:0.149428 Epoch [2/2], Iter [543/3125], train_loss:0.158572 Epoch [2/2], Iter [544/3125], train_loss:0.172126 Epoch [2/2], Iter [545/3125], train_loss:0.176276 Epoch [2/2], Iter [546/3125], train_loss:0.177032 Epoch [2/2], Iter [547/3125], train_loss:0.173978 Epoch [2/2], Iter [548/3125], train_loss:0.164149 Epoch [2/2], Iter [549/3125], train_loss:0.160977 Epoch [2/2], Iter [550/3125], train_loss:0.141250 Epoch [2/2], Iter [551/3125], train_loss:0.167351 Epoch [2/2], Iter [552/3125], train_loss:0.154863 Epoch [2/2], Iter [553/3125], train_loss:0.176878 Epoch [2/2], Iter [554/3125], train_loss:0.152597 Epoch [2/2], Iter [555/3125], train_loss:0.173390 Epoch [2/2], Iter [556/3125], train_loss:0.163720 Epoch [2/2], Iter [557/3125], train_loss:0.160260 Epoch [2/2], Iter [558/3125], train_loss:0.178257 Epoch [2/2], Iter [559/3125], train_loss:0.175589 Epoch [2/2], Iter [560/3125], train_loss:0.148475 Epoch [2/2], Iter [561/3125], train_loss:0.173594 Epoch [2/2], Iter [562/3125], train_loss:0.165406 Epoch [2/2], Iter [563/3125], train_loss:0.171584 Epoch [2/2], Iter [564/3125], train_loss:0.167694 Epoch [2/2], Iter [565/3125], train_loss:0.163094 Epoch [2/2], Iter [566/3125], train_loss:0.157451 Epoch [2/2], Iter [567/3125], train_loss:0.163195 Epoch [2/2], Iter [568/3125], train_loss:0.145743 Epoch [2/2], Iter [569/3125], train_loss:0.165041 Epoch [2/2], Iter [570/3125], train_loss:0.155912 Epoch [2/2], Iter [571/3125], train_loss:0.150290 Epoch [2/2], Iter [572/3125], train_loss:0.162542 Epoch [2/2], Iter [573/3125], train_loss:0.147671 Epoch [2/2], Iter [574/3125], train_loss:0.153121 Epoch [2/2], Iter [575/3125], train_loss:0.151718 Epoch [2/2], Iter [576/3125], train_loss:0.167825 Epoch [2/2], Iter [577/3125], train_loss:0.148835 Epoch [2/2], Iter [578/3125], train_loss:0.151512 Epoch [2/2], Iter [579/3125], train_loss:0.187779 Epoch [2/2], Iter [580/3125], train_loss:0.157333 Epoch [2/2], Iter [581/3125], train_loss:0.165742 Epoch [2/2], Iter [582/3125], train_loss:0.167597 Epoch [2/2], Iter [583/3125], train_loss:0.163270 Epoch [2/2], Iter [584/3125], train_loss:0.144670 Epoch [2/2], Iter [585/3125], train_loss:0.149435 Epoch [2/2], Iter [586/3125], train_loss:0.170580 Epoch [2/2], Iter [587/3125], train_loss:0.160914 Epoch [2/2], Iter [588/3125], train_loss:0.151355 Epoch [2/2], Iter [589/3125], train_loss:0.167059 Epoch [2/2], Iter [590/3125], train_loss:0.151443 Epoch [2/2], Iter [591/3125], train_loss:0.147637 Epoch [2/2], Iter [592/3125], train_loss:0.173933 Epoch [2/2], Iter [593/3125], train_loss:0.157407 Epoch [2/2], Iter [594/3125], train_loss:0.169269 Epoch [2/2], Iter [595/3125], train_loss:0.155772 Epoch [2/2], Iter [596/3125], train_loss:0.189058 Epoch [2/2], Iter [597/3125], train_loss:0.147937 Epoch [2/2], Iter [598/3125], train_loss:0.179247 Epoch [2/2], Iter [599/3125], train_loss:0.167485 Epoch [2/2], Iter [600/3125], train_loss:0.153575 Epoch [2/2], Iter [601/3125], train_loss:0.143053 Epoch [2/2], Iter [602/3125], train_loss:0.150471 Epoch [2/2], Iter [603/3125], train_loss:0.143764 Epoch [2/2], Iter [604/3125], train_loss:0.161357 Epoch [2/2], Iter [605/3125], train_loss:0.177912 Epoch [2/2], Iter [606/3125], train_loss:0.193015 Epoch [2/2], Iter [607/3125], train_loss:0.165355 Epoch [2/2], Iter [608/3125], train_loss:0.160645 Epoch [2/2], Iter [609/3125], train_loss:0.153148 Epoch [2/2], Iter [610/3125], train_loss:0.161745 Epoch [2/2], Iter [611/3125], train_loss:0.177804 Epoch [2/2], Iter [612/3125], train_loss:0.169567 Epoch [2/2], Iter [613/3125], train_loss:0.163330 Epoch [2/2], Iter [614/3125], train_loss:0.156796 Epoch [2/2], Iter [615/3125], train_loss:0.176123 Epoch [2/2], Iter [616/3125], train_loss:0.154425 Epoch [2/2], Iter [617/3125], train_loss:0.152680 Epoch [2/2], Iter [618/3125], train_loss:0.150936 Epoch [2/2], Iter [619/3125], train_loss:0.174734 Epoch [2/2], Iter [620/3125], train_loss:0.164248 Epoch [2/2], Iter [621/3125], train_loss:0.154376 Epoch [2/2], Iter [622/3125], train_loss:0.181289 Epoch [2/2], Iter [623/3125], train_loss:0.154710 Epoch [2/2], Iter [624/3125], train_loss:0.173619 Epoch [2/2], Iter [625/3125], train_loss:0.160207 Epoch [2/2], Iter [626/3125], train_loss:0.164651 Epoch [2/2], Iter [627/3125], train_loss:0.168672 Epoch [2/2], Iter [628/3125], train_loss:0.152033 Epoch [2/2], Iter [629/3125], train_loss:0.145318 Epoch [2/2], Iter [630/3125], train_loss:0.153201 Epoch [2/2], Iter [631/3125], train_loss:0.136641 Epoch [2/2], Iter [632/3125], train_loss:0.165298 Epoch [2/2], Iter [633/3125], train_loss:0.146980 Epoch [2/2], Iter [634/3125], train_loss:0.157089 Epoch [2/2], Iter [635/3125], train_loss:0.153481 Epoch [2/2], Iter [636/3125], train_loss:0.180023 Epoch [2/2], Iter [637/3125], train_loss:0.177965 Epoch [2/2], Iter [638/3125], train_loss:0.168382 Epoch [2/2], Iter [639/3125], train_loss:0.170590 Epoch [2/2], Iter [640/3125], train_loss:0.146684 Epoch [2/2], Iter [641/3125], train_loss:0.154656 Epoch [2/2], Iter [642/3125], train_loss:0.148962 Epoch [2/2], Iter [643/3125], train_loss:0.162826 Epoch [2/2], Iter [644/3125], train_loss:0.154299 Epoch [2/2], Iter [645/3125], train_loss:0.140432 Epoch [2/2], Iter [646/3125], train_loss:0.169591 Epoch [2/2], Iter [647/3125], train_loss:0.160964 Epoch [2/2], Iter [648/3125], train_loss:0.163820 Epoch [2/2], Iter [649/3125], train_loss:0.180686 Epoch [2/2], Iter [650/3125], train_loss:0.149200 Epoch [2/2], Iter [651/3125], train_loss:0.165878 Epoch [2/2], Iter [652/3125], train_loss:0.153168 Epoch [2/2], Iter [653/3125], train_loss:0.158429 Epoch [2/2], Iter [654/3125], train_loss:0.164462 Epoch [2/2], Iter [655/3125], train_loss:0.173659 Epoch [2/2], Iter [656/3125], train_loss:0.158212 Epoch [2/2], Iter [657/3125], train_loss:0.147685 Epoch [2/2], Iter [658/3125], train_loss:0.165053 Epoch [2/2], Iter [659/3125], train_loss:0.147815 Epoch [2/2], Iter [660/3125], train_loss:0.156994 Epoch [2/2], Iter [661/3125], train_loss:0.166037 Epoch [2/2], Iter [662/3125], train_loss:0.172137 Epoch [2/2], Iter [663/3125], train_loss:0.164935 Epoch [2/2], Iter [664/3125], train_loss:0.135215 Epoch [2/2], Iter [665/3125], train_loss:0.158562 Epoch [2/2], Iter [666/3125], train_loss:0.160104 Epoch [2/2], Iter [667/3125], train_loss:0.151053 Epoch [2/2], Iter [668/3125], train_loss:0.170116 Epoch [2/2], Iter [669/3125], train_loss:0.137139 Epoch [2/2], Iter [670/3125], train_loss:0.157071 Epoch [2/2], Iter [671/3125], train_loss:0.188446 Epoch [2/2], Iter [672/3125], train_loss:0.161760 Epoch [2/2], Iter [673/3125], train_loss:0.155279 Epoch [2/2], Iter [674/3125], train_loss:0.179824 Epoch [2/2], Iter [675/3125], train_loss:0.167790 Epoch [2/2], Iter [676/3125], train_loss:0.146095 Epoch [2/2], Iter [677/3125], train_loss:0.177003 Epoch [2/2], Iter [678/3125], train_loss:0.148537 Epoch [2/2], Iter [679/3125], train_loss:0.152893 Epoch [2/2], Iter [680/3125], train_loss:0.159080 Epoch [2/2], Iter [681/3125], train_loss:0.156266 Epoch [2/2], Iter [682/3125], train_loss:0.166901 Epoch [2/2], Iter [683/3125], train_loss:0.168217 Epoch [2/2], Iter [684/3125], train_loss:0.169070 Epoch [2/2], Iter [685/3125], train_loss:0.162491 Epoch [2/2], Iter [686/3125], train_loss:0.168951 Epoch [2/2], Iter [687/3125], train_loss:0.125869 Epoch [2/2], Iter [688/3125], train_loss:0.181195 Epoch [2/2], Iter [689/3125], train_loss:0.177369 Epoch [2/2], Iter [690/3125], train_loss:0.161117 Epoch [2/2], Iter [691/3125], train_loss:0.157555 Epoch [2/2], Iter [692/3125], train_loss:0.159016 Epoch [2/2], Iter [693/3125], train_loss:0.157256 Epoch [2/2], Iter [694/3125], train_loss:0.164547 Epoch [2/2], Iter [695/3125], train_loss:0.165163 Epoch [2/2], Iter [696/3125], train_loss:0.168598 Epoch [2/2], Iter [697/3125], train_loss:0.167152 Epoch [2/2], Iter [698/3125], train_loss:0.174982 Epoch [2/2], Iter [699/3125], train_loss:0.150731 Epoch [2/2], Iter [700/3125], train_loss:0.144726 Epoch [2/2], Iter [701/3125], train_loss:0.161515 Epoch [2/2], Iter [702/3125], train_loss:0.168019 Epoch [2/2], Iter [703/3125], train_loss:0.151221 Epoch [2/2], Iter [704/3125], train_loss:0.155330 Epoch [2/2], Iter [705/3125], train_loss:0.162497 Epoch [2/2], Iter [706/3125], train_loss:0.146891 Epoch [2/2], Iter [707/3125], train_loss:0.144152 Epoch [2/2], Iter [708/3125], train_loss:0.169863 Epoch [2/2], Iter [709/3125], train_loss:0.151497 Epoch [2/2], Iter [710/3125], train_loss:0.171949 Epoch [2/2], Iter [711/3125], train_loss:0.144536 Epoch [2/2], Iter [712/3125], train_loss:0.174258 Epoch [2/2], Iter [713/3125], train_loss:0.156956 Epoch [2/2], Iter [714/3125], train_loss:0.143885 Epoch [2/2], Iter [715/3125], train_loss:0.154764 Epoch [2/2], Iter [716/3125], train_loss:0.158947 Epoch [2/2], Iter [717/3125], train_loss:0.169612 Epoch [2/2], Iter [718/3125], train_loss:0.183921 Epoch [2/2], Iter [719/3125], train_loss:0.164853 Epoch [2/2], Iter [720/3125], train_loss:0.152667 Epoch [2/2], Iter [721/3125], train_loss:0.164879 Epoch [2/2], Iter [722/3125], train_loss:0.162339 Epoch [2/2], Iter [723/3125], train_loss:0.155902 Epoch [2/2], Iter [724/3125], train_loss:0.166309 Epoch [2/2], Iter [725/3125], train_loss:0.169535 Epoch [2/2], Iter [726/3125], train_loss:0.157821 Epoch [2/2], Iter [727/3125], train_loss:0.177206 Epoch [2/2], Iter [728/3125], train_loss:0.161878 Epoch [2/2], Iter [729/3125], train_loss:0.165634 Epoch [2/2], Iter [730/3125], train_loss:0.162080 Epoch [2/2], Iter [731/3125], train_loss:0.149615 Epoch [2/2], Iter [732/3125], train_loss:0.157824 Epoch [2/2], Iter [733/3125], train_loss:0.160058 Epoch [2/2], Iter [734/3125], train_loss:0.164464 Epoch [2/2], Iter [735/3125], train_loss:0.173593 Epoch [2/2], Iter [736/3125], train_loss:0.177152 Epoch [2/2], Iter [737/3125], train_loss:0.185746 Epoch [2/2], Iter [738/3125], train_loss:0.161387 Epoch [2/2], Iter [739/3125], train_loss:0.163264 Epoch [2/2], Iter [740/3125], train_loss:0.165813 Epoch [2/2], Iter [741/3125], train_loss:0.172456 Epoch [2/2], Iter [742/3125], train_loss:0.173366 Epoch [2/2], Iter [743/3125], train_loss:0.167722 Epoch [2/2], Iter [744/3125], train_loss:0.152204 Epoch [2/2], Iter [745/3125], train_loss:0.162796 Epoch [2/2], Iter [746/3125], train_loss:0.148085 Epoch [2/2], Iter [747/3125], train_loss:0.138988 Epoch [2/2], Iter [748/3125], train_loss:0.165154 Epoch [2/2], Iter [749/3125], train_loss:0.163704 Epoch [2/2], Iter [750/3125], train_loss:0.139482 Epoch [2/2], Iter [751/3125], train_loss:0.146638 Epoch [2/2], Iter [752/3125], train_loss:0.179230 Epoch [2/2], Iter [753/3125], train_loss:0.168096 Epoch [2/2], Iter [754/3125], train_loss:0.157946 Epoch [2/2], Iter [755/3125], train_loss:0.121326 Epoch [2/2], Iter [756/3125], train_loss:0.160800 Epoch [2/2], Iter [757/3125], train_loss:0.143741 Epoch [2/2], Iter [758/3125], train_loss:0.164546 Epoch [2/2], Iter [759/3125], train_loss:0.153188 Epoch [2/2], Iter [760/3125], train_loss:0.153755 Epoch [2/2], Iter [761/3125], train_loss:0.156617 Epoch [2/2], Iter [762/3125], train_loss:0.165343 Epoch [2/2], Iter [763/3125], train_loss:0.152439 Epoch [2/2], Iter [764/3125], train_loss:0.150895 Epoch [2/2], Iter [765/3125], train_loss:0.171088 Epoch [2/2], Iter [766/3125], train_loss:0.152008 Epoch [2/2], Iter [767/3125], train_loss:0.159565 Epoch [2/2], Iter [768/3125], train_loss:0.141178 Epoch [2/2], Iter [769/3125], train_loss:0.151271 Epoch [2/2], Iter [770/3125], train_loss:0.141239 Epoch [2/2], Iter [771/3125], train_loss:0.178049 Epoch [2/2], Iter [772/3125], train_loss:0.181188 Epoch [2/2], Iter [773/3125], train_loss:0.173826 Epoch [2/2], Iter [774/3125], train_loss:0.175326 Epoch [2/2], Iter [775/3125], train_loss:0.167236 Epoch [2/2], Iter [776/3125], train_loss:0.149285 Epoch [2/2], Iter [777/3125], train_loss:0.153321 Epoch [2/2], Iter [778/3125],
http://www.w-s-a.com/news/922329/

相关文章:

  • 网站找人做seo然后网站搜不到了网站建设seoppt
  • 做网站优化有用吗学做文案的网站
  • wordpress 知名网站怎么做微网站
  • 用电脑怎么做原创视频网站河南建设工程信息网一体化平台官网
  • 云服务器和网站空间郑州做招商的网站
  • 规模以上工业企业的标准北京seo结算
  • 软件开发过程模型如何做网站性能优化
  • 网站建站公司广州南京江北新区楼盘
  • 哪些做展架图的网站好开发公司2022年工作计划
  • 磨床 东莞网站建设wordpress下载类主题系统主题
  • 免费学编程网站芜湖做网站都有哪些
  • 能发外链的网站门户网站网页设计规范
  • 网站建设所需人力南城区网站建设公司
  • 网站做图尺寸大小手机模板网站模板下载网站有哪些内容
  • 德阳市建设管理一体化平台网站做美食网站
  • 怎么做自己的推广网站2024年瘟疫大爆发
  • vps正常网站打不开linux网站建设
  • 福州网站快速排名在一个网站的各虚拟目录中默认文档的文件名要相同
  • 网站开发 流程图网站开发用哪个linux
  • 怎么用自己电脑做服务器发布网站吗seo门户网价格是多少钱
  • 备案网站可以做影视站网站400
  • 四川住房与城乡建设部网站注册登记
  • 网站建设第三方沈阳工程最新动态
  • 兰州做网站客户上海企业在线登记
  • 新乡公司做网站wordpress被大量注册
  • 小语种服务网站公众号平台建设网站
  • 免费做mc皮肤网站企业网站建设合同模板
  • 做网站可以申请个体户么网站的定位分析
  • jsp做的零食网站下载wordpress侧边栏折叠
  • 帝国网站单页做301南京旅游网站建设公司