当前位置: 首页 > news >正文

南宁哪里做网站网站开发主要职责

南宁哪里做网站,网站开发主要职责,淘宝天猫优惠券网站怎么做,app开发的价值Titanic : Machine Learning from Disaster 链接#xff1a;GitHub源代码 Question 要求你建立一个预测模型来回答这个问题#xff1a;“什么样的人更有可能生存#xff1f;”使用乘客数据#xff08;如姓名、年龄、性别、社会经济阶层等#xff09;。 一、导入数据包和数…Titanic : Machine Learning from Disaster 链接GitHub源代码 Question 要求你建立一个预测模型来回答这个问题“什么样的人更有可能生存”使用乘客数据如姓名、年龄、性别、社会经济阶层等。 一、导入数据包和数据集 import pandas as pd from pandas import Series, DataFrame import numpy as np from matplotlib import pyplot as plt import seaborn as sns重点在kaggle notebook上时应该把pd.read_csv(./kaggle/input/titanic/train.csv)引号中第一个.去掉 读入训练集和测试及都需要 train pd.read_csv(./kaggle/input/titanic/train.csv) test pd.read_csv(./kaggle/input/titanic/test.csv) allData pd.concat([train, test], ignore_indexTrue) # dataNum train.shape[0] # featureNum train.shape[1] train.info()二、数据总览 概况 输入train.info()回车可以查看数据集整体信息 class pandas.core.frame.DataFrame RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6 KB输入train.head()可以查看数据样例 特征 VariableDefinitionKeysurvivalSurvival0 No, 1 YespclassTicket class(客舱等级)1 1st, 2 2nd, 3 3rdsexSexAgeAge in yearssibsp# of siblings / spouses aboard the Titanic(旁系亲属)parch# of parents / children aboard the Titanic(直系亲属)ticketTicket numberfarePassenger farecabinCabin number(客舱编号)embarkedPort of Embarkation(上船港口编号)C Cherbourg, Q Queenstown, S Southampton 三、可视化数据分析 性别特征Sex 女性生存率远高于男性 # Sex sns.countplot(Sex, hueSurvived, datatrain) plt.show()等级特征Pclass 乘客等级越高生存率越高 # Pclass sns.barplot(xPclass, ySurvived, datatrain) plt.show()家庭成员数量特征 FamilySizeParchSibSp 家庭成员数量适中生存率高 # FamilySize SibSp Parch 1 allData[FamilySize] allData[SibSp] allData[Parch] 1 sns.barplot(xFamilySize, ySurvived, dataallData) plt.show()上船港口特征Embarked 上船港口不同生存率不同 # Embarked sns.countplot(Embarked, hueSurvived, datatrain) plt.show()年龄特征Age 年龄小或者正值壮年生存率高 # Age sns.stripplot(xSurvived, yAge, datatrain, jitterTrue) plt.show()年龄生存密度 facet sns.FacetGrid(train, hueSurvived,aspect2) facet.map(sns.kdeplot,Age,shade True) facet.set(xlim(0, train[Age].max())) facet.add_legend() plt.xlabel(Age) plt.ylabel(density) plt.show()儿童相对于全年龄段有特殊的生存率 作者将10及以下视为儿童设置单独标签 费用特征Fare 费用越高生存率越高 # Fare sns.stripplot(xSurvived, yFare, datatrain, jitterTrue) plt.show()姓名特征Name 头衔特征Title 头衔由姓名的前置称谓进行分类 # Name allData[Title] allData[Name].apply(lambda x:x.split(,)[1].split(.)[0].strip()) pd.crosstab(allData[Title], allData[Sex])统计分析 TitleClassification {Officer:[Capt, Col, Major, Dr, Rev],Royalty:[Don, Sir, the Countess, Dona, Lady],Mrs:[Mme, Ms, Mrs],Miss:[Mlle, Miss],Mr:[Mr],Master:[Master,Jonkheer]} for title in TitleClassification.keys():cnt 0for name in TitleClassification[title]:cnt allData.groupby([Title]).size()[name]print (title,:,cnt)设置标签 TitleClassification {Officer:[Capt, Col, Major, Dr, Rev],Royalty:[Don, Sir, the Countess, Dona, Lady],Mrs:[Mme, Ms, Mrs],Miss:[Mlle, Miss],Mr:[Mr],Master:[Master,Jonkheer]} TitleMap {} for title in TitleClassification.keys():TitleMap.update(dict.fromkeys(TitleClassification[title], title)) allData[Title] allData[Title].map(TitleMap)头衔不同生存率不同 sns.barplot(xTitle, ySurvived, dataallData) plt.show()票号特征Ticket 有一定连续座位存在票号相同的乘客生存率高 #Ticket TicketCnt allData.groupby([Ticket]).size() allData[SameTicketNum] allData[Ticket].apply(lambda x:TicketCnt[x]) sns.barplot(xSameTicketNum, ySurvived, dataallData) plt.show() # allData[SameTicketNum]二维/多维分析 可以将任意两个/多个数据进行分析 二维分析之Pclass Age # Pclass Age sns.violinplot(Pclass, Age, hueSurvived, datatrain, splitTrue) plt.show()二维分析之Age Sex # Age Sex sns.swarmplot(xAge, ySex, datatrain, hueSurvived) plt.show()四、数据清洗 异常处理 离散型数据 有可用标签 -- One-Hot编码 Sex Pclass Embarked 都有已经设置好的标签int或float或string等可以直接进行get_dummies拆分成多维向量增加特征维度其中Embarked存在一定缺失值通过对整体的分析填充上估计值 # Sex allData allData.join(pd.get_dummies(allData[Sex], prefixSex)) # Pclass allData allData.join(pd.get_dummies(allData[Pclass], prefixPclass)) # Embarked allData[allData[Embarked].isnull()] # 查看缺失值 allData.groupby(by[Pclass,Embarked]).Fare.mean() # Pclass1, EmbarkC, 中位数76 allData[Embarked] allData[Embarked].fillna(C) allData allData.join(pd.get_dummies(allData[Embarked], prefixEmbarked))无可用标签 -- 设计标签 -- One-Hot FamilySize Name Ticket需要对整体数据统一处理再进行标记 # FamilySize def FamilyLabel(s):if (s 4):return 4elif (s 2 or s 3):return 3elif (s 1 or s 7):return 2elif (s 5 or s 6):return 1elif (s 1 or s 7):return 0 allData[FamilyLabel] allData[FamilySize].apply(FamilyLabel) allData allData.join(pd.get_dummies(allData[FamilyLabel], prefixFam))# Name TitleLabelMap {Mr:1.0,Mrs:5.0,Miss:4.5,Master:2.5,Royalty:3.5,Officer:2.0} def TitleLabel(s):return TitleLabelMap[s] # allData[TitleLabel] allData[Title].apply(TitleLabel) allData allData.join(pd.get_dummies(allData[Title], prefixTitle))# Ticket def TicketLabel(s):if (s 3 or s 4):return 3elif (s 2 or s 8):return 2elif (s 1 or s 5 or s 6 or s 7):return 1elif (s 1 or s 8):return 0 allData[TicketLabel] allData[SameTicketNum].apply(TicketLabel) allData allData.join(pd.get_dummies(allData[TicketLabel], prefixTicNum))连续型数据 Age Fare 进行标准化缩小数据范围加速梯度下降 # Age allData[Child] allData[Age].apply(lambda x:1 if x 10 else 0) # 儿童标签 allData[Age] (allData[Age]-allData[Age].mean())/allData[Age].std() # 标准化 allData[Age].fillna(value0, inplaceTrue) # 填充缺失值 # Fare allData[Fare] allData[Fare].fillna(25) # 填充缺失值 allData[allData[Survived].notnull()][Fare] allData[allData[Survived].notnull()][Fare].apply(lambda x:300.0 if x500 else x) allData[Fare] allData[Fare].apply(lambda x:(x-allData[Fare].mean())/allData[Fare].std())清除无用特征 清除无用特征降低算法复杂度 # 清除无用特征 allData.drop([Cabin, PassengerId, Ticket, Name, Title, Sex, SibSp, Parch, FamilySize, Embarked, Pclass, Title, FamilyLabel, SameTicketNum, TicketLabel], axis1, inplaceTrue)重新分割训练集/测试集 一开始为了处理方便作者将训练集和测试集合并现在根据Survived是否缺失来讲训练集和测试集分开 # 重新分割数据集 train_data allData[allData[Survived].notnull()] test_data allData[allData[Survived].isnull()] test_data test_data.reset_index(dropTrue)xTrain train_data.drop([Survived], axis1) yTrain train_data[Survived] xTest test_data.drop( [Survived], axis1)特征相关性分析 该步骤用于筛选特征后向程序员反馈特征是否有效、是否重叠若有问题可以修改之前的特征方案 # 特征间相关性分析 Correlation pd.DataFrame(allData[allData.columns.to_list()]) colormap plt.cm.viridis plt.figure(figsize(24,22)) sns.heatmap(Correlation.astype(float).corr(), linewidths0.1, vmax1.0, cmapcolormap, linecolorwhite, annotTrue, squareTrue) plt.show()五、模型建立 参数优化 导入模型包 from sklearn.pipeline import Pipeline from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from sklearn.feature_selection import SelectKBest作者选择随机森林分类器 网格搜索调试参数 pipe Pipeline([(select, SelectKBest(k10)),(classify, RandomForestClassifier(random_state 10, max_features sqrt))]) param_test {classify__n_estimators:list(range(20,100,5)),classify__max_depth :list(range(3,10,1))} gsearch GridSearchCV(estimatorpipe, param_gridparam_test, scoringroc_auc, cv10) gsearch.fit(xTrain, yTrain) print (gsearch.best_params_, gsearch.best_score_)运行时间较长结束后出现结果 {classify__max_depth: 6, classify__n_estimators: 70} 0.8790924679681529建立模型 用以上参数进行输入模型训练 rfc RandomForestClassifier(n_estimators70, max_depth6, random_state10, max_featuressqrt) rfc.fit(xTrain, yTrain)导出结果 predictions rfc.predict(xTest) output pd.DataFrame({PassengerId:test[PassengerId], Survived:predictions.astype(int64)}) output.to_csv(my_submission.csv, indexFalse)六、提交评分 官方推荐教程 附完整代码 Jupiter Notebook导出为Python Script格式需要ipynb格式请点击 GitHub源代码 # To add a new cell, type # %% # To add a new markdown cell, type # %% [markdown]# %% import pandas as pd from pandas import Series, DataFrame import numpy as np from matplotlib import pyplot as plt import seaborn as sns# %% [markdown] # # Features # Variable | Definition | Key # :-:|:-:|:-: # survival | Survival | 0 No, 1 Yes # pclass | Ticket class(客舱等级) | 1 1st, 2 2nd, 3 3rd # sex | Sex # Age | Age in years # sibsp | # of siblings / spouses aboard the Titanic(旁系亲属) # parch | # of parents / children aboard the Titanic(直系亲属) # ticket | Ticket number # fare | Passenger fare # cabin | Cabin number(客舱编号) # embarked | Port of Embarkation(上船的港口编号) | C Cherbourg, Q Queenstown, S Southampton# %% train pd.read_csv(./kaggle/input/titanic/train.csv) test pd.read_csv(./kaggle/input/titanic/test.csv) allData pd.concat([train, test], ignore_indexTrue) # dataNum train.shape[0] # featureNum train.shape[1] train.head()# %% # Sex sns.countplot(Sex, hueSurvived, datatrain) plt.show()# %% # Pclass sns.barplot(xPclass, ySurvived, datatrain) plt.show() # Pclass Age sns.violinplot(Pclass, Age, hueSurvived, datatrain, splitTrue) plt.show()# %% # FamilySize SibSp Parch 1 allData[FamilySize] allData[SibSp] allData[Parch] 1 sns.barplot(xFamilySize, ySurvived, dataallData) plt.show()# %% # Embarked sns.countplot(Embarked, hueSurvived, datatrain) plt.show()# %% # Age sns.stripplot(xSurvived, yAge, datatrain, jitterTrue) plt.show() facet sns.FacetGrid(train, hueSurvived, aspect2) facet.map(sns.kdeplot, Age, shadeTrue) facet.set(xlim(0, train[Age].max())) facet.add_legend() plt.xlabel(Age) plt.ylabel(density) plt.show() # Age Sex sns.swarmplot(xAge, ySex, datatrain, hueSurvived) plt.show()# %% # Fare sns.stripplot(xSurvived, yFare, datatrain, jitterTrue) plt.show()# %% # Name # allData[Title] allData[Name].str.extract(([A-Za-z])\., expandFalse) # str.extract不知道在干嘛 allData[Title] allData[Name].apply(lambda x: x.split(,)[1].split(.)[0].strip() ) # pd.crosstab(allData[Title], allData[Sex]) TitleClassification {Officer: [Capt, Col, Major, Dr, Rev],Royalty: [Don, Sir, the Countess, Dona, Lady],Mrs: [Mme, Ms, Mrs],Miss: [Mlle, Miss],Mr: [Mr],Master: [Master, Jonkheer], } TitleMap {} for title in TitleClassification.keys():TitleMap.update(dict.fromkeys(TitleClassification[title], title))# cnt 0for name in TitleClassification[title]:cnt allData.groupby([Title]).size()[name]# print (title,:,cnt) allData[Title] allData[Title].map(TitleMap) sns.barplot(xTitle, ySurvived, dataallData) plt.show()# %% # Ticket TicketCnt allData.groupby([Ticket]).size() allData[SameTicketNum] allData[Ticket].apply(lambda x: TicketCnt[x]) sns.barplot(xSameTicketNum, ySurvived, dataallData) plt.show() # allData[SameTicketNum]# %% [markdown] # # 数据清洗 # - Sex Pclass Embarked -- Ont-Hot # - Age Fare -- Standardize # - FamilySize Name Ticket -- ints -- One-Hot# %% # Sex allData allData.join(pd.get_dummies(allData[Sex], prefixSex)) # Pclass allData allData.join(pd.get_dummies(allData[Pclass], prefixPclass)) # Embarked allData[allData[Embarked].isnull()] # 查看缺失值 allData.groupby(by[Pclass, Embarked]).Fare.mean() # Pclass1, EmbarkC, 中位数76 allData[Embarked] allData[Embarked].fillna(C) allData allData.join(pd.get_dummies(allData[Embarked], prefixEmbarked))# %% # Age allData[Child] allData[Age].apply(lambda x: 1 if x 10 else 0) # 儿童标签 allData[Age] (allData[Age] - allData[Age].mean()) / allData[Age].std() # 标准化 allData[Age].fillna(value0, inplaceTrue) # 填充缺失值 # Fare allData[Fare] allData[Fare].fillna(25) # 填充缺失值 allData[allData[Survived].notnull()][Fare] allData[allData[Survived].notnull()][Fare ].apply(lambda x: 300.0 if x 500 else x) allData[Fare] allData[Fare].apply(lambda x: (x - allData[Fare].mean()) / allData[Fare].std() )# %% # FamilySize def FamilyLabel(s):if s 4:return 4elif s 2 or s 3:return 3elif s 1 or s 7:return 2elif s 5 or s 6:return 1elif s 1 or s 7:return 0allData[FamilyLabel] allData[FamilySize].apply(FamilyLabel) allData allData.join(pd.get_dummies(allData[FamilyLabel], prefixFam))# Name TitleLabelMap {Mr: 1.0,Mrs: 5.0,Miss: 4.5,Master: 2.5,Royalty: 3.5,Officer: 2.0, }def TitleLabel(s):return TitleLabelMap[s]# allData[TitleLabel] allData[Title].apply(TitleLabel) allData allData.join(pd.get_dummies(allData[Title], prefixTitle))# Ticket def TicketLabel(s):if s 3 or s 4:return 3elif s 2 or s 8:return 2elif s 1 or s 5 or s 6 or s 7:return 1elif s 1 or s 8:return 0allData[TicketLabel] allData[SameTicketNum].apply(TicketLabel) allData allData.join(pd.get_dummies(allData[TicketLabel], prefixTicNum))# %% # 清除无用特征 allData.drop([Cabin,PassengerId,Ticket,Name,Title,Sex,SibSp,Parch,FamilySize,Embarked,Pclass,Title,FamilyLabel,SameTicketNum,TicketLabel,],axis1,inplaceTrue, )# 重新分割数据集 train_data allData[allData[Survived].notnull()] test_data allData[allData[Survived].isnull()] test_data test_data.reset_index(dropTrue)xTrain train_data.drop([Survived], axis1) yTrain train_data[Survived] xTest test_data.drop([Survived], axis1)# allData.columns.to_list()# %% # 特征间相关性分析 Correlation pd.DataFrame(allData[allData.columns.to_list()]) colormap plt.cm.viridis plt.figure(figsize(24, 22)) sns.heatmap(Correlation.astype(float).corr(),linewidths0.1,vmax1.0,cmapcolormap,linecolorwhite,annotTrue,squareTrue, ) plt.show()# %% [markdown] # # 网格筛选随机森林参数 # - n_estimator # - max_depth# %% from sklearn.pipeline import Pipeline from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from sklearn.feature_selection import SelectKBest# %%pipe Pipeline([(select, SelectKBest(k10)),(classify, RandomForestClassifier(random_state10, max_featuressqrt)),] ) param_test {classify__n_estimators: list(range(20, 100, 5)),classify__max_depth: list(range(3, 10, 1)), } gsearch GridSearchCV(estimatorpipe, param_gridparam_test, scoringroc_auc, cv10) gsearch.fit(xTrain, yTrain) print(gsearch.best_params_, gsearch.best_score_)# %% rfc RandomForestClassifier(n_estimators70, max_depth6, random_state10, max_featuressqrt ) rfc.fit(xTrain, yTrain) predictions rfc.predict(xTest)output pd.DataFrame({PassengerId: test[PassengerId], Survived: predictions.astype(int64)} ) output.to_csv(my_submission.csv, indexFalse)链接GitHub源代码
http://www.w-s-a.com/news/433020/

相关文章:

  • php和网站开发网络软营销
  • 大型做网站的公司有哪些wordpress注册链接无效
  • 推荐门户网站建设公司网站开发移动端
  • 公司网站的栏目设置成都十大监理公司排名
  • 安溪住房和城乡建设网站关岭县建设局网站
  • 网站域名注销备案徐州房产网
  • 筑聘网windows优化大师自动安装
  • 龙华高端网站设计门户网站建设方案公司
  • 网站开发作用网站建设哪家专业
  • 网站设计报告总结南宁商城网站推广公司
  • 淘宝做店招的网站免费网站建设自助建站
  • 重庆工信部网站绵阳公司网站建设
  • 购物网站开发流程制作企业网页
  • 定州哪里可以做网站建设项目环境影响登记表备案系统网站
  • 网站建设费属于广告费小猪网站怎么做的
  • 国内优秀设计网站站长哈尔滨微网站建设
  • 如何建设一个优秀的电商网站沐风seo
  • 从零开始学网站建设知乎安防网站下载
  • 打开网站弹出qq应用软件有哪些
  • 温州网站建设seo网站 如何做 中英文切换
  • 聊城做网站的公司资讯信阳 网站建设
  • 天津市工程建设交易网站查汗国珠海 网页设计
  • 龙果学院大型网站稳定性建设汾阳做网站
  • 湖北 个人网站备案时间域名查询备案查询
  • 网站推广方式校园网站怎么建
  • 长沙seo网站排名怎么在百度发帖
  • 织梦贷款网站模板做印章网站
  • 彭州做网站上海百度网络推广
  • 广州网站搭建快速提升网站排名荧光字网站
  • 15年做那些网站能致富做seo是什么意思