如何做360购物网站,seo整站优化什么价格,asp网站建设,360营销推广【阿旭机器学习实战】系列文章主要介绍机器学习的各种算法模型及其实战案例#xff0c;欢迎点赞#xff0c;关注共同学习交流。 注:本文模型结果不好#xff0c;仅做学习参考使用#xff0c;提供思路。了解数据处理思路,训练模型和预测数值的过程。 目录1. 读取数据K线图绘… 【阿旭机器学习实战】系列文章主要介绍机器学习的各种算法模型及其实战案例欢迎点赞关注共同学习交流。 注:本文模型结果不好仅做学习参考使用提供思路。了解数据处理思路,训练模型和预测数值的过程。 目录1. 读取数据K线图绘制2.构建回归模型3.绘制预测结果在这里插入图片描述 1. 读取数据
import numpy as np # 数学计算
import pandas as pd # 数据处理
import matplotlib.pyplot as plt
from datetime import datetime as dt关注公众号阿旭算法与机器学习回复“ML31”即可获取本文数据集、源码与项目文档欢迎共同学习交流 df pd.read_csv(./000001.csv) print(np.shape(df))
df.head()(611, 14)dateopenhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma2002019-05-3012.3212.3812.2212.11646284.62-0.18-1.4512.36612.39012.579747470.29739308.42953969.3912019-05-2912.3612.5912.4012.26666411.50-0.09-0.7212.38012.45312.673751584.45738170.10973189.9522019-05-2812.3112.5512.4912.26880703.120.120.9712.38012.50512.742719548.29781927.80990340.4332019-05-2712.2112.4212.3711.931048426.000.020.1612.39412.50512.824689649.77812117.301001879.1042019-05-2412.3512.4512.3512.31495526.190.060.4912.39612.49812.928637251.61781466.471046943.98
股票数据的特征
date日期open开盘价high最高价close收盘价low最低价volume成交量price_change价格变动p_change涨跌幅ma55日均价ma1010日均价ma20:20日均价v_ma5:5日均量v_ma10:10日均量v_ma20:20日均量
# 将每一个数据的键值的类型从字符串转为日期
df[date] pd.to_datetime(df[date])
# 将日期变为索引
df df.set_index(date)
# 按照时间升序排列
df.sort_values(by[date], inplaceTrue, ascendingTrue)
df.tail()openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20date2019-05-2412.3512.4512.3512.31495526.190.060.4912.39612.49812.928637251.61781466.471046943.982019-05-2712.2112.4212.3711.931048426.000.020.1612.39412.50512.824689649.77812117.301001879.102019-05-2812.3112.5512.4912.26880703.120.120.9712.38012.50512.742719548.29781927.80990340.432019-05-2912.3612.5912.4012.26666411.50-0.09-0.7212.38012.45312.673751584.45738170.10973189.952019-05-3012.3212.3812.2212.11646284.62-0.18-1.4512.36612.39012.579747470.29739308.42953969.39
# 检测是否有缺失数据 NaNs
df.dropna(axis0 , inplaceTrue)
df.isna().sum()open 0
high 0
close 0
low 0
volume 0
price_change 0
p_change 0
ma5 0
ma10 0
ma20 0
v_ma5 0
v_ma10 0
v_ma20 0
dtype: int64K线图绘制
Min_date df.index.min()
Max_date df.index.max()
print (First date is,Min_date)
print (Last date is,Max_date)
print (Max_date - Min_date)First date is 2016-11-29 00:00:00
Last date is 2019-05-30 00:00:00
912 days 00:00:00from plotly import tools
from plotly.graph_objs import *
from plotly.offline import init_notebook_mode, iplot, iplot_mpl
init_notebook_mode()
import chart_studio.plotly as py
import plotly.graph_objs as gotrace go.Ohlc(xdf.index, opendf[open], highdf[high], lowdf[low], closedf[close])
data [trace]
iplot(data, filenamesimple_ohlc)2.构建回归模型
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing# 创建标签数据即预测值, 根据当前的数据预测5天以后的收盘价
num 5 # 预测5天后的情况
df[label] df[close].shift(-num) # 预测值,将5天后的收盘价当作当前样本的标签print(df.shape)(611, 14)# 丢弃 label, price_change, p_change, 不需要它们做预测
Data df.drop([label, price_change, p_change],axis1)
Data.tail()openhighcloselowvolumema5ma10ma20v_ma5v_ma10v_ma20date2019-05-2412.3512.4512.3512.31495526.1912.39612.49812.928637251.61781466.471046943.982019-05-2712.2112.4212.3711.931048426.0012.39412.50512.824689649.77812117.301001879.102019-05-2812.3112.5512.4912.26880703.1212.38012.50512.742719548.29781927.80990340.432019-05-2912.3612.5912.4012.26666411.5012.38012.45312.673751584.45738170.10973189.952019-05-3012.3212.3812.2212.11646284.6212.36612.39012.579747470.29739308.42953969.39
X Data.values
# 去掉最后5行因为没有Y的值
X X[:-num]
# 将特征进行归一化
X preprocessing.scale(X)
# 去掉标签为null的最后5行
df.dropna(inplaceTrue)
Target df.label
y Target.valuesprint(np.shape(X), np.shape(y))(606, 11) (606,)# 将数据分为训练数据和测试数据
X_train, y_train X[0:550, :], y[0:550]
X_test, y_test X[550:, -51:], y[550:606]
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)(550, 11)
(550,)
(56, 11)
(56,)lr LinearRegression()
lr.fit(X_train, y_train)
lr.score(X_test, y_test) # 使用绝对系数 R^2 评估模型0.04930040648385525# 做预测 取最后5行数据预测5天后的股票价格
X_Predict X[-num:]
Forecast lr.predict(X_Predict)
print(Forecast)
print(y[-num:])[12.5019651 12.45069629 12.56248765 12.3172638 12.27070154]
[12.35 12.37 12.49 12.4 12.22]# 查看模型的各个特征参数的系数值
for idx, col_name in enumerate([open, high, close, low, volume, ma5, ma10, ma20, v_ma5, v_ma10, v_ma20]):print(The coefficient for {} is {}.format(col_name, lr.coef_[idx]))The coefficient for open is -0.7623399996475224
The coefficient for high is 0.8321435171405448
The coefficient for close is 0.24463705375238926
The coefficient for low is 1.091415550493547
The coefficient for volume is 0.0043807937569128675
The coefficient for ma5 is -0.30717535019465575
The coefficient for ma10 is 0.1935431079947582
The coefficient for ma20 is 0.24902077484698157
The coefficient for v_ma5 is 0.17472336466033722
The coefficient for v_ma10 is 0.08873934447969857
The coefficient for v_ma20 is -0.279107026944207753.绘制预测结果
# 预测 2019-05-13 到 2019-05-17 , 一共 5 天的收盘价
trange pd.date_range(2019-05-13, periodsnum, freqd)
trangeDatetimeIndex([2019-05-13, 2019-05-14, 2019-05-15, 2019-05-16,2019-05-17],dtypedatetime64[ns], freqD)# 产生预测值dataframe
Predict_df pd.DataFrame(Forecast, indextrange)
Predict_df.columns [forecast]
Predict_dfforecast2019-05-1312.5019652019-05-1412.4506962019-05-1512.5624882019-05-1612.3172642019-05-1712.270702
# 将预测值添加到原始dataframe
df pd.read_csv(./000001.csv)
df[date] pd.to_datetime(df[date])
df df.set_index(date)
# 按照时间升序排列
df.sort_values(by[date], inplaceTrue, ascendingTrue)
df_concat pd.concat([df, Predict_df], axis1)df_concat df_concat[df_concat.index.isin(Predict_df.index)]
df_concat.tail(num)openhighcloselowvolumeprice_changep_changema5ma10ma20v_ma5v_ma10v_ma20forecast2019-05-1312.3312.5412.3012.23741917.75-0.38-3.0012.53813.14313.6371107915.511191640.891211461.6112.5019652019-05-1412.2012.7512.4912.161182598.120.191.5412.44612.97913.5851129903.461198753.071237823.6912.4506962019-05-1512.5813.1112.9212.571103988.500.433.4412.51012.89213.5601155611.001208209.791254306.8812.5624882019-05-1612.9312.9912.8512.78634901.44-0.07-0.5412.64812.76713.518971160.961168630.361209357.4212.3172642019-05-1712.9212.9312.4412.36965000.88-0.41-3.1912.60012.62613.411925681.341153473.431138638.7012.270702
# 画预测值和实际值
df_concat[close].plot(colorgreen, linewidth1)
df_concat[forecast].plot(colororange, linewidth3)
plt.xlabel(Time)
plt.ylabel(Price)
plt.show()如果文章对你有帮助感谢点赞关注 关注下方GZH阿旭算法与机器学习回复“ML31”即可获取本文数据集、源码与项目文档欢迎共同学习交流