## Stock Market Prediction

In this example, we’ll be exploring how we can use Linear Regression to predict stock prices thirty days into the future. 

You probably won’t get rich with this algorithm, but I still think it is super cool to watch your computer predict the price of your favorite stocks.

### Getting Started
Create a new forcaste_stock.py file. 

In our project, we’ll need to import a few dependencies. If you don’t have them installed, you will have to run pip install [dependency] on the command line.

* Note:  students should try a personal (free) registration. Then (if) suggest students to provide their own credentials, they won't trigger the API limit.

In [1]:
import quandl
import pandas as pd
import numpy as np
import datetime

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import preprocessing, svm

### Stock Data & Dataframe
To get our stock data, we can set our dataframe to quandl.get("WIKI/[NAME OF STOCK]"). In this notebook, I will use Facebook, but you can choose any stock you wish.

In [2]:
df = quandl.get("WIKI/FB")

If we print(df.tail()) and run our python program, we see that we get a lot of data for each stock:

In [3]:
print(df.tail())

              Open    High     Low   Close       Volume  Ex-Dividend  \
Date                                                                   
2018-03-21  164.80  173.40  163.30  169.39  105350867.0          0.0   
2018-03-22  166.13  170.27  163.72  164.89   73389988.0          0.0   
2018-03-23  165.44  167.10  159.02  159.39   52306891.0          0.0   
2018-03-26  160.82  161.10  149.02  160.06  125438294.0          0.0   
2018-03-27  156.31  162.85  150.75  152.19   76787884.0          0.0   

            Split Ratio  Adj. Open  Adj. High  Adj. Low  Adj. Close  \
Date                                                                  
2018-03-21          1.0     164.80     173.40    163.30      169.39   
2018-03-22          1.0     166.13     170.27    163.72      164.89   
2018-03-23          1.0     165.44     167.10    159.02      159.39   
2018-03-26          1.0     160.82     161.10    149.02      160.06   
2018-03-27          1.0     156.31     162.85    150.75      152.19  

However, in our case, we only need the Adj. Close column for our predictions.

In [4]:
df = df[['Adj. Close']]

Now, let’s set up our forecasting. We want to predict 30 days into the future, so we’ll set a variable  forecast_out equal to that. Then, we need to create a new column in our dataframe which serves as our label, which, in machine learning, is known as our output. To fill our output data with data to be trained upon, we will set our prediction column equal to our Adj. Close column, but shifted 30 units up.

In [5]:
forecast_out = int(30) # predicting 30 days into future
df['Prediction'] = df[['Adj. Close']].shift(-forecast_out) #  label column with data shifted 30 units up

You can see the new dataframe by printing it: print(df.tail())

In [6]:
print(df.tail())

            Adj. Close  Prediction
Date                              
2018-03-21      169.39         NaN
2018-03-22      164.89         NaN
2018-03-23      159.39         NaN
2018-03-26      160.06         NaN
2018-03-27      152.19         NaN


### Defining Features & Labels
Our X will be an array consisting of our Adj. Close values, and so we want to drop the Prediction column. We also want to scale our input values. Scaling our features allow us to normalize the data.

In [7]:
X = np.array(df.drop(['Prediction'], 1))
X = preprocessing.scale(X)

Now, if you printed the dataframe after we created the Prediction column, you saw that for the last 30 days, there were NaNs, or no label data. We’ll set a new input variable to these days and remove them from the X array.

In [8]:
X_forecast = X[-forecast_out:] # set X_forecast equal to last 30
X = X[:-forecast_out] # remove last 30 from X

To define our y, or output, we will set it equal to our array of the Prediction values and remove the last 30 days where we don’t have any pricing data.

In [9]:
y = np.array(df['Prediction'])
y = y[:-forecast_out]

### Linear Regression
Finally, prediction time! First, we’ll want to split our testing and training data sets, and set our test_size equal to 20% of the data. The training set contains our known outputs, or prices, that our model learns on, and our test dataset is to test our model’s predictions based on what it learned from the training set.

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

Now, we can initiate our Linear Regression model and fit it with training data. After training, to test the accuracy of the model, we “score” it using the testing data. We can get an r^2 (coefficient of determination) reading based on how far the predicted price was compared to the actual price in the test data set. When I ran the algorithm, I usually got a value of over 90%.

In [11]:
# Training
clf = LinearRegression()
clf.fit(X_train,y_train)
# Testing
confidence = clf.score(X_test, y_test)
print("confidence: ", confidence)

confidence:  0.9774550590140216


Lastly, we can to predict our X_forecast values:

In [12]:
forecast_prediction = clf.predict(X_forecast)
print(forecast_prediction)

[176.65957997 183.06758243 183.5102074  180.89469619 179.5366423
 181.44797741 182.53442053 186.86007368 188.50985767 185.01915618
 181.86042341 179.46622469 180.15028147 183.95283238 183.32913355
 187.28257934 185.90440612 188.81164743 188.33884348 185.44166183
 187.76544294 187.43347421 188.67081221 176.06606012 171.62975072
 172.87714838 168.35030206 162.81748989 163.49148701 155.57453577]


Here’s what I got for FB stock.

What’s next?
Try and plot your data using matplotlib. Make your predictions more advanced by including more features. When completed, feel free to share your projects in the comments! I’d love to check them out :)

