上海网站建设案例,装饰公司设计用什么软件,官网推广计划,wordpress主机空间选择前言
前几天发布了pytorch实现#xff0c;TensorFlow实现为#xff1a;基于RNN模型的心脏病预测(tensorflow实现)#xff0c;但是一处繁琐地方 一处错误#xff0c;这篇文章进行修改#xff0c;修改效果还是好了不少#xff1b;源文章为#xff1a;基于RNN模型的心脏病…前言
前几天发布了pytorch实现TensorFlow实现为基于RNN模型的心脏病预测(tensorflow实现)但是一处繁琐地方 一处错误这篇文章进行修改修改效果还是好了不少源文章为基于RNN模型的心脏病预测提供tensorflow和pytorch实现 错误一 这个也不算是错误就是之前数据标准化、划分数据集的时候我用的很麻烦如下图(之前): 这样无疑是很麻烦的修改后我们先对数据进行标准化后再进行划分就会简单很多(详细请看下面代码) 错误二 模型参数输入这里应该是13个特征维度而且这里用nn.BCELoss后面处理也不好因为最后应该还加一层激活函数sigmoid的所以这次修改采用多分类处理方法激活函数采用CrossEntropyLoss具体如图 BCELoss、CrossEntropyLoss参考资料 https://blog.csdn.net/qq_36803941/article/details/138673111 https://zhuanlan.zhihu.com/p/98785902 https://www.cnblogs.com/zhangxianrong/p/14773075.html https://zhuanlan.zhihu.com/p/59800597 修改版本代码
1、数据处理
1、导入库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader, TensorDataset
import torch device cuda if torch.cuda.is_available() else cpu
devicecuda2、导入数据
data pd.read_csv(./heart.csv)data.head()agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget063131452331015002.30011137121302500118703.50021241011302040017201.42021356111202360117800.82021457001203540116310.62021
age - 年龄sex - (1 male(男性); 0 (女性))cp - chest pain type(胸部疼痛类型)1典型的心绞痛-typical2非典型心绞痛-atypical3没有心绞痛-non-anginal4无症状-asymptomatictrestbps - 静息血压 (in mm Hg on admission to the hospital)chol - 胆固醇 in mg/dlfbs - (空腹血糖 120 mg/dl) (1 true; 0 false)restecg - 静息心电图测量0普通1ST-T波异常2可能左心室肥大thalach - 最高心跳率exang - 运动诱发心绞痛 (1 yes; 0 no)oldpeak - 运动相对于休息引起的ST抑制slope - 运动ST段的峰值斜率1上坡-upsloping2平的-flat3下坡-downslopingca - 主要血管数目(0-4)thal - 一种叫做地中海贫血的血液疾病3 normal; 6 固定的缺陷-fixed defect; 7 可逆的缺陷-reversable defecttarget - 是否患病 (1yes, 0no)
3、数据分析
数据初步分析
data.info() # 数据类型分析class pandas.core.frame.DataFrame
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 age 303 non-null int64 1 sex 303 non-null int64 2 cp 303 non-null int64 3 trestbps 303 non-null int64 4 chol 303 non-null int64 5 fbs 303 non-null int64 6 restecg 303 non-null int64 7 thalach 303 non-null int64 8 exang 303 non-null int64 9 oldpeak 303 non-null float6410 slope 303 non-null int64 11 ca 303 non-null int64 12 thal 303 non-null int64 13 target 303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.3 KB其中分类变量为sex、cp、fbs、restecg、exang、slope、ca、thal、target
数值型变量age、trestbps、chol、thalach、oldpeak
data.describe() # 描述性agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltargetcount303.000000303.000000303.000000303.000000303.000000303.000000303.000000303.000000303.000000303.000000303.000000303.000000303.000000303.000000mean54.3663370.6831680.966997131.623762246.2640260.1485150.528053149.6468650.3267331.0396041.3993400.7293732.3135310.544554std9.0821010.4660111.03205217.53814351.8307510.3561980.52586022.9051610.4697941.1610750.6162261.0226060.6122770.498835min29.0000000.0000000.00000094.000000126.0000000.0000000.00000071.0000000.0000000.0000000.0000000.0000000.0000000.00000025%47.5000000.0000000.000000120.000000211.0000000.0000000.000000133.5000000.0000000.0000001.0000000.0000002.0000000.00000050%55.0000001.0000001.000000130.000000240.0000000.0000001.000000153.0000000.0000000.8000001.0000000.0000002.0000001.00000075%61.0000001.0000002.000000140.000000274.5000000.0000001.000000166.0000001.0000001.6000002.0000001.0000003.0000001.000000max77.0000001.0000003.000000200.000000564.0000001.0000002.000000202.0000001.0000006.2000002.0000004.0000003.0000001.000000
年纪均值54中位数55标准差9说明主要是老年人偏大静息血压均值131.62 成年人一般正常血压收缩压 120 mmHg偏大胆固醇均值246.26理想水平小于 200 mg/dL偏大最高心率均值149.64一般静息状态下通常是 60 到 100 次每分钟偏大
最大值和最小值都可能发生无异常值
缺失值
data.isnull().sum()age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64相关性分析
import seaborn as snsplt.figure(figsize(20, 15))sns.heatmap(data.corr(), annotTrue, cmapGreens)plt.show()
相关系数的等级划分
非常弱的相关性 0.00 至 0.19 或 -0.00 至 -0.19解释几乎不存在线性关系。 弱相关性 0.20 至 0.39 或 -0.20 至 -0.39解释存在一定的线性关系但较弱。 中等相关性 0.40 至 0.59 或 -0.40 至 -0.59解释有明显的线性关系但不是特别强。 强相关性 0.60 至 0.79 或 -0.60 至 -0.79解释两个变量之间有较强的线性关系。 非常强的相关性 0.80 至 1.00 或 -0.80 至 -1.00解释几乎完全线性相关表明两个变量的变化高度一致。 target与chol、没有什么相关性fbs是分类变量chol胆固醇是数值型变量但是从实际角度这些都有影响故不剔除特征
4、数据标准化
from sklearn.preprocessing import StandardScalerscaler StandardScaler()X data.iloc[:, :-1]
y data.iloc[:, -1]# 这里只需要对X标准化即可
X scaler.fit_transform(X)5、数据划分
这里先划分为训练集测试集 91
from sklearn.model_selection import train_test_split# 由于要使用pytorch先将数据转化为torch
X torch.tensor(np.array(X), dtypetorch.float32)
y torch.tensor(np.array(y), dtypetorch.int64)X_train, X_test, y_train, y_test train_test_split(X, y, test_size0.1, random_state42)# 输出维度
X_train.shape, y_train.shape(torch.Size([272, 13]), torch.Size([272]))6、动态加载数据
from torch.utils.data import TensorDataset, DataLoader
train_dl DataLoader(TensorDataset(X_train, y_train), batch_size64, shuffleTrue)
test_dl DataLoader(TensorDataset(X_test, y_test), batch_size64, shuffleFalse)2、创建模型
定义一个RNN层 rnn nn.RNN(input_size10, hidden_size20, num_layers2, nonlinearity‘tanh’, biasTrue, batch_firstFalse, dropout0, bidirectionalFalse)input_size: 输入的特征维度hidden_size: 隐藏层的特征维度num_layers: RNN 层的数量nonlinearity: 非线性激活函数 (‘tanh’ 或 ‘relu’)bias: 如果为 False则内部不含偏置项默认为 Truebatch_first: 如果为 True则输入和输出张量提供为 (batch, seq, feature)默认为 False (seq, batch, feature)dropout: 如果非零则除了最后一层在每层的输出中引入一个 Dropout 层默认为 0bidirectional: 如果为 True则将成为双向 RNN默认为 False
import torch
import torch.nn as nn # 创建模型该问题本质是二分类问题故最后一层全连接层用激活函数为sigmoid
模型结构RNN隐藏层200激活函数reluLinear-- 100(relu) - 1(sigmoid)# 创建模型
class Model(nn.Module):def __init__(self):super().__init__()self.rnn nn.RNN(input_size13, hidden_size200, num_layers1, batch_firstTrue)self.fc1 nn.Linear(200, 50)#self.fc2 nn.Linear(100, 50)self.fc3 nn.Linear(50, 2)def forward(self, x):x, hidden1 self.rnn(x)x self.fc1(x)#x self.fc2(x)x self.fc3(x)return xmodel Model().to(device)
modelModel((rnn): RNN(13, 200, batch_firstTrue)(fc1): Linear(in_features200, out_features50, biasTrue)(fc3): Linear(in_features50, out_features2, biasTrue)
)# 查看模型输出的维度
model(torch.rand(30,13).to(device)).shapetorch.Size([30, 2])3、模型训练
1、设置超参数
loss_fn nn.CrossEntropyLoss()
lr 1e-4
optimizer torch.optim.Adam(model.parameters(), lrlr)2、设置训练函数
def train(dataloader, model, loss_fn, optimizer):# 总大小size len(dataloader.dataset)# 批次大小batch_size len(dataloader)# 准确率和损失trian_acc, train_loss 0, 0# 训练for X, y in dataloader:X, y X.to(device), y.to(device)# 模型训练与误差评分pred model(X)loss loss_fn(pred, y)# 梯度清零optimizer.zero_grad() # 梯度上更新# 方向传播loss.backward()# 梯度更新optimizer.step()# 记录损失和准确率train_loss loss.item()trian_acc (pred.argmax(1) y).type(torch.float64).sum().item()# 计算损失和准确率trian_acc / sizetrain_loss / batch_sizereturn trian_acc, train_loss3、设置测试函数
def test(dataloader, model, loss_fn):size len(dataloader.dataset)batch_size len(dataloader)test_acc, test_loss 0, 0with torch.no_grad():for X, y in dataloader:X, y X.to(device), y.to(device)pred model(X)loss loss_fn(pred, y)test_loss loss.item()test_acc (pred.argmax(1) y).type(torch.float64).sum().item()test_acc / size test_loss / batch_sizereturn test_acc, test_loss4、模型训练
train_acc []
train_loss []
test_acc []
test_loss []# 定义训练次数
epoches 50for epoch in range(epoches):# 训练model.train()epoch_trian_acc, epoch_train_loss train(train_dl, model, loss_fn, optimizer)# 测试model.eval()epoch_test_acc, epoch_test_loss test(test_dl, model, loss_fn)# 记录train_acc.append(epoch_trian_acc)train_loss.append(epoch_train_loss)test_acc.append(epoch_test_acc)test_loss.append(epoch_test_loss)template (Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f})print(template.format(epoch1, epoch_trian_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss))Epoch: 1, Train_acc:49.6%, Train_loss:0.686, Test_acc:58.1%, Test_loss:0.684
Epoch: 2, Train_acc:62.1%, Train_loss:0.682, Test_acc:64.5%, Test_loss:0.671
Epoch: 3, Train_acc:68.0%, Train_loss:0.662, Test_acc:71.0%, Test_loss:0.658
Epoch: 4, Train_acc:69.1%, Train_loss:0.655, Test_acc:77.4%, Test_loss:0.645
Epoch: 5, Train_acc:73.9%, Train_loss:0.643, Test_acc:80.6%, Test_loss:0.632
Epoch: 6, Train_acc:74.3%, Train_loss:0.637, Test_acc:80.6%, Test_loss:0.620
Epoch: 7, Train_acc:75.7%, Train_loss:0.620, Test_acc:80.6%, Test_loss:0.608
Epoch: 8, Train_acc:78.3%, Train_loss:0.612, Test_acc:80.6%, Test_loss:0.596
Epoch: 9, Train_acc:79.8%, Train_loss:0.591, Test_acc:83.9%, Test_loss:0.586
Epoch:10, Train_acc:79.0%, Train_loss:0.590, Test_acc:83.9%, Test_loss:0.575
Epoch:11, Train_acc:81.2%, Train_loss:0.584, Test_acc:83.9%, Test_loss:0.563
Epoch:12, Train_acc:79.8%, Train_loss:0.562, Test_acc:83.9%, Test_loss:0.553
Epoch:13, Train_acc:80.5%, Train_loss:0.546, Test_acc:83.9%, Test_loss:0.542
Epoch:14, Train_acc:80.1%, Train_loss:0.546, Test_acc:83.9%, Test_loss:0.531
Epoch:15, Train_acc:81.2%, Train_loss:0.517, Test_acc:83.9%, Test_loss:0.521
Epoch:16, Train_acc:81.6%, Train_loss:0.521, Test_acc:83.9%, Test_loss:0.509
Epoch:17, Train_acc:82.4%, Train_loss:0.508, Test_acc:83.9%, Test_loss:0.497
Epoch:18, Train_acc:82.7%, Train_loss:0.494, Test_acc:83.9%, Test_loss:0.487
Epoch:19, Train_acc:83.1%, Train_loss:0.496, Test_acc:83.9%, Test_loss:0.477
Epoch:20, Train_acc:82.4%, Train_loss:0.469, Test_acc:83.9%, Test_loss:0.469
Epoch:21, Train_acc:83.1%, Train_loss:0.472, Test_acc:83.9%, Test_loss:0.463
Epoch:22, Train_acc:82.4%, Train_loss:0.451, Test_acc:83.9%, Test_loss:0.458
Epoch:23, Train_acc:83.5%, Train_loss:0.456, Test_acc:83.9%, Test_loss:0.455
Epoch:24, Train_acc:83.1%, Train_loss:0.438, Test_acc:83.9%, Test_loss:0.453
Epoch:25, Train_acc:83.5%, Train_loss:0.431, Test_acc:80.6%, Test_loss:0.451
Epoch:26, Train_acc:84.2%, Train_loss:0.444, Test_acc:80.6%, Test_loss:0.449
Epoch:27, Train_acc:83.1%, Train_loss:0.427, Test_acc:80.6%, Test_loss:0.449
Epoch:28, Train_acc:84.2%, Train_loss:0.409, Test_acc:80.6%, Test_loss:0.449
Epoch:29, Train_acc:83.8%, Train_loss:0.405, Test_acc:80.6%, Test_loss:0.448
Epoch:30, Train_acc:83.8%, Train_loss:0.411, Test_acc:80.6%, Test_loss:0.448
Epoch:31, Train_acc:83.8%, Train_loss:0.378, Test_acc:80.6%, Test_loss:0.446
Epoch:32, Train_acc:84.6%, Train_loss:0.421, Test_acc:80.6%, Test_loss:0.444
Epoch:33, Train_acc:84.6%, Train_loss:0.391, Test_acc:80.6%, Test_loss:0.443
Epoch:34, Train_acc:85.7%, Train_loss:0.388, Test_acc:80.6%, Test_loss:0.446
Epoch:35, Train_acc:84.2%, Train_loss:0.396, Test_acc:80.6%, Test_loss:0.449
Epoch:36, Train_acc:84.2%, Train_loss:0.346, Test_acc:80.6%, Test_loss:0.451
Epoch:37, Train_acc:84.9%, Train_loss:0.379, Test_acc:80.6%, Test_loss:0.453
Epoch:38, Train_acc:84.9%, Train_loss:0.389, Test_acc:80.6%, Test_loss:0.453
Epoch:39, Train_acc:83.1%, Train_loss:0.386, Test_acc:80.6%, Test_loss:0.453
Epoch:40, Train_acc:84.9%, Train_loss:0.350, Test_acc:80.6%, Test_loss:0.452
Epoch:41, Train_acc:83.5%, Train_loss:0.353, Test_acc:80.6%, Test_loss:0.455
Epoch:42, Train_acc:85.7%, Train_loss:0.373, Test_acc:80.6%, Test_loss:0.458
Epoch:43, Train_acc:84.6%, Train_loss:0.345, Test_acc:80.6%, Test_loss:0.459
Epoch:44, Train_acc:85.3%, Train_loss:0.377, Test_acc:80.6%, Test_loss:0.461
Epoch:45, Train_acc:85.7%, Train_loss:0.354, Test_acc:80.6%, Test_loss:0.462
Epoch:46, Train_acc:84.9%, Train_loss:0.327, Test_acc:80.6%, Test_loss:0.467
Epoch:47, Train_acc:82.7%, Train_loss:0.347, Test_acc:80.6%, Test_loss:0.470
Epoch:48, Train_acc:84.6%, Train_loss:0.350, Test_acc:80.6%, Test_loss:0.470
Epoch:49, Train_acc:84.9%, Train_loss:0.344, Test_acc:80.6%, Test_loss:0.470
Epoch:50, Train_acc:85.3%, Train_loss:0.375, Test_acc:80.6%, Test_loss:0.4725、结果展示
import matplotlib.pyplot as plt
#隐藏警告
import warnings
warnings.filterwarnings(ignore) #忽略警告信息
plt.rcParams[font.sans-serif] [SimHei] # 用来正常显示中文标签
plt.rcParams[axes.unicode_minus] False # 用来正常显示负号
plt.rcParams[figure.dpi] 100 #分辨率epoch_length range(epoches)plt.figure(figsize(12, 3))plt.subplot(1, 2, 1)
plt.plot(epoch_length, train_acc, labelTrain Accuaray)
plt.plot(epoch_length, test_acc, labelTest Accuaray)
plt.legend(loclower right)
plt.title(Accurary)plt.subplot(1, 2, 2)
plt.plot(epoch_length, train_loss, labelTrain Loss)
plt.plot(epoch_length, test_loss, labelTest Loss)
plt.legend(locupper right)
plt.title(Loss)plt.show()
趋于平稳不是没有变化是变化很小整体模型效果还可以
6、模型评估
# 评估返回的是自己在model.compile中设置这里为accuracy
test_acc, test_loss test(test_dl, model, loss_fn)
print(socre[loss, accuracy]: , test_acc, test_loss) # 返回为两个一个是loss一个是accuracysocre[loss, accuracy]: 0.8064516129032258 0.47150832414627075