A machine that generates money with pandas-datareader and Prophet

What is this?

This isn’t really a money machine, I’m just kidding about that, sorry.

This is just a quick exploration of two awesome Python packages that I wanted to play with for a while

Prophet seems like an awesome project by Facebook to make state-of-the-art time series forecasting really easy and simple. I’ve been hoping to give it a try for a while.

I’ve also been itching to play aound with pandas-datareader, which makes it really grab data from multiple remote datasources, including historic stock prices from Yahoo! Finance

I decided to see how Prophet does at forecasting future stock prices based on historic data. Now, before you scold me about the fact that what I’m doing is silly, I know, you can’t really use this to predict stock prices, but time series forecasting can be useful for many other things.

Plus it never hurts to prove well known hypothesis to yourself, no matter how much it might seem like common knowledge.

So I don’t really expect to become a millionaire quite yet, but it’s a fun little project to learn more about these awesome packages.

So let’s get started!

Enviroment

Docker, docker-compose and Jupyter is my preferred way of setting up a reproducible enviroment for analysis.

The notebook for this analysis runs with Dockerfile I set up using Jupyter’s opinionated docker image, along with some tweaks to install the two custom packages.

Once I got everything up and running, I could fire up a Notebook and get cracking.

#The usual suspects
import datetime
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#Plus some new interesting characters
from fbprophet import Prophet


Our data source

pandas_datareader makes it incredibly easy to get historic stock market data

For today we’ll just get the daily closing prices of the S&P 500 (SPY)

We’ll train on data from the start of 2000 to 2017-03-14 We’ll use the last week’s data as a holdout set to evaluate our model (up until 2017-03-21)

start = datetime.datetime(2000, 1,1)
end = datetime.datetime(2017, 3, 21)

train = web.DataReader("SPY", 'yahoo', start, end)
compare = train.copy()
train = train['2000-01-03': '2017-03-14']

Open High Low Close Volume Adj Close
Date
2000-01-03 148.250000 148.250000 143.875000 145.4375 8164300 105.366938
2000-01-04 143.531204 144.062500 139.640594 139.7500 8089800 101.246443
2000-01-05 139.937500 141.531204 137.250000 140.0000 12177900 101.427563
2000-01-06 139.625000 141.500000 137.750000 137.7500 6227200 99.797478
2000-01-07 140.312500 145.750000 140.062500 145.7500 8066500 105.593338

Can we forecast future prices?

This is where we get to play with Prophet. First we’ll prepare our dataset in a way that Prophet likes it.

df = pd.DataFrame({'ds': train.index, 'y': train['Adj Close']}).reset_index()

df = df.drop('Date', axis=1)

ds y
0 2000-01-03 105.366938
1 2000-01-04 101.246443
2 2000-01-05 101.427563
3 2000-01-06 99.797478
4 2000-01-07 105.593338

Now we can train our model. Since we don’t expect stock prices to really follow a weekly seasonality we’ll switch that off.

m = Prophet(weekly_seasonality=False)
m.fit(df);


To be able to make forecasts on future dates we can use a handy helper method to generate our expected input

future = m.make_future_dataframe(periods=365)


Finally we can make our prediction, using probably the coolest line of code I’ve ever written

forecast = m.predict(future)


Let’s see how that did!

We can use our compare dataframe from earlier to see how our forecasted value for yesterday’s price compares to the actual price. We’ll also see what our model thinks the price would be in a year from now.

def evaluate(forecasted, compare):
forecasted = forecast.set_index('ds').loc['2017-3-21'][['yhat', 'yhat_lower', 'yhat_upper']]
print("Yesterday's Forecast:" )
print("{0:.2f} lower: {1:.2f}, upper: {2:.2f}, ".format(forecasted['yhat'], forecasted['yhat_lower'], forecasted['yhat_upper']))
print("\nReal:" )
print(real)

print('\nNext year''s Forecast:')

return forecast.tail(1)[['ds','yhat', 'yhat_lower', 'yhat_upper']]

evaluate(forecasted, compare)

Yesterday's Forecast:
225.36 lower: 217.63, upper: 233.19,

Real:
233.729996

Next years Forecast:

ds yhat yhat_lower yhat_upper
4690 2018-03-14 242.348516 232.2096 252.518225

Hmm, seems to be underestimating the actual price by quite a bit, however we’re still within Prophet’s 80% confidence interval (well almost).

Did I mention this isn’t acually a valid technique for making investment decisions?

That being said, let’s plot our performance, which again is made pretty easy by Prophet.

p = m.plot(forecast)


Ok, so it seems like it’s doing pretty well modeling the trend in historic data, but it does seem to miss the recent rise in prices. One thing we could try is to force the model to fit the historic trend more or less closely, or as the Prophet documentation puts it, making the model more or less flexible.

To do this, we can use the tweak the changepoint_prior_scale parameter (I’ll chat more on changepoints a little later on). The best value I’ve found for this parameter was 0.01

m = Prophet(weekly_seasonality=False, changepoint_prior_scale=0.01)
m.fit(df)
future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
evaluate(forecasted, compare)

Yesterday's Forecast:
225.36 lower: 217.34, upper: 233.33,

Real:
233.729996

Next years Forecast:

ds yhat yhat_lower yhat_upper
4690 2018-03-14 242.348516 233.238948 252.18966

This makes our model overfit the data a bit more, which gets us a little closer to our real price, but we’re still not quite there yet, and our confidence intervals is also undereporting.

Seriously, don’t use this to invest real money, ok?

Ok, so we didn’t do too great, what can we learn?

Another really nice thing about Prophet is that it’s really easy to plot the components of your forecasts (trend, yearly and/or weekly seasonality). Often this is a pretty useful tool for analysis by itself.

p = m.plot_components(forecast)


From the top chart we can clearly see that we’ve had one of the longest bull runs in history, which is pretty crazy, and our model thinks things can only go up from here. This is because of one of the fundamental underpinnings of time series forecasting.

The market, like other time series, will have points where the general trend changes drastically. These are called change points and a good forecasting algorithm needs to detect change points to be able to model the trend in the data. Change points are not caused by general cycles in the data either (seasonality), which can be predicted with more accuracy.

In fact, the one thing that any forecasting algorithm cannot predict is when the next change point will be, and how what effect that will have on the trend. This is also the flaw of my money machine. The market is a random walk, and even though we know everything about historic price trends, we can’t know when trends change.

What’s really interesting though is looking at yearly seasonality plot. From this plot it does seem like there is a strong seasonality in growth rates of stock prices throughout the year. We see a peak around May 1st, a decline up until late October, after which prices seem to trend up again.

This is a well known trading adage to Sell in May and go away, but looking at the data, it seems to hold up quite well and might just be a good trading strategy. So maybe I have just a built a Money Machine after all!

Then again, I’m sure if it was that simple we would have many more self made millionaires in the stock market. From Investopedia:

There are limitations to implementing this strategy in practice, such as added transaction costs and tax implications of the rotation in and out of equities. Another drawback is that market timing and seasonality strategies do not always work out, and the actual results may be very different from the theoretical ones.

So please don’t bet the farm quite yet!

Conclusion

This was a fun little test of Prophet and pandas_datareader.