Stock Market Prediction Using Machine Learning

The Data

  1. Which column is the target vector? We will need to create it!
  2. How will features in the sample help predict the target vector? We will need to chain together several days of trading into one observation so that the model can learn based on patterns.

Clean Up

# keep only relevant columns 
data = data[['date', 'open', 'high', 'low', 'close', 'volume', 'adjclose']]

Technical Indicators

# add feature for percent change between open and close data['percentChange'] = data['close'] / data['open'] - 1 
# create new feature for the percent change of the previous trading day data['percentchange-1'] = data['percentChange'].shift(-1) data.head()
# let's repeat again to get the previous percent change data['percentchange-2'] = data['percentChange'].shift(-2) data.head()
# reorder to have oldest date at top
# this is useful for creating rolling calculations data.sort_values('date', ascending=True, inplace=True)
# reset the index since sorting will sort the index as well data.reset_index(inplace=True, drop=True)
# create a simple moving average using rolling calculations data['shortSma'] = data['close'].rolling(5).mean() data.head(10)
# shift the SMA so that a particular day knows the SMA of the previous day 
data['shortSma-1'] = data['shortSma'].shift(1) data.head(10)
# add a feature for the SMA from two days ago
data['shortSma-2'] = data['shortSma'].shift(2) data.head(10)
# trim the obervations with missing calculated data data.dropna(inplace=True) data.head(10)
# separate features 
features = ['percentchange-1', 'percentchange-2', 'shortSma-1', 'shortSma-2']
X = data[features]
# separate the target vector
y = data['percentChange']
# scale the features with unit-mean standard deviation 
X = pd.DataFrame(preprocessing.scale(X), index = X.index, columns = X.columns)
# create training and testing sets by splitting the full dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create a linear regression 
object my_linreg = LinearRegression()
# fit the linear regression object to the training data, y_train)
# print out the linear regression coefficients print(my_linreg.coef_)Out: [-1.83861458e-03 3.81734250e-06 -1.14631594e-01 1.14308136e-01]
# put coefficients in dataframe, take absolute value and sort 
coffDf = pd.DataFrame(list(zip(X.columns,np.absolute(my_linreg.coef_))), columns=['Feature','Coefficient'])
coffDf.sort_values('Coefficient', ascending=False)
# make predictions on the testing set 
y_prediction = my_linreg.predict(X_test)
[-1.29455651e-03 -3.96322055e-03 6.59419710e-04 -2.59123916e-03 1.95729587e-03 3.82718757e-03 -4.94597329e-03 2.88532953e-03 -2.18723623e-03 -2.78564265e-03 2.24264894e-03 -2.14085641e-03 5.18296801e-03 -5.71483878e-03 -5.88216743e-04 3.16728605e-04 -2.61084259e-04 2.01959059e-03 -2.34961277e-03 -2.43163486e-03 1.79225243e-03 2.64776225e-03 3.02945196e-03 -1.45447217e-04 3.71679868e-03 2.23369595e-03 -1.06319715e-03 2.11801614e-03 -7.81162087e-03 8.04337714e-03 4.67753488e-04 2.96275248e-04 5.88536554e-03 6.41958589e-03 -5.19879732e-05 1.28886416e-03 2.00956319e-03 -2.68709458e-03 -6.09952132e-03 -3.79141647e-03 3.58422779e-03 2.88474576e-03 1.53894669e-03 2.13184595e-04 2.46513006e-04 4.40403505e-03 -1.72738960e-03 1.23677597e-03 -2.92647785e-04 -2.72097674e-03 5.93572234e-03]
# calculate the mean squared error of the predictions 
mse = metrics.mean_squared_error(y_test, y_prediction)
# take the square root of MSE to get Root Mean Squared Error (RMSE) rmse = np.sqrt(mse) print(rmse)Out: 0.044721137880460754



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store