Financial Data Modeling¶

The idea for doing this project comes from the book Doing Data Science by Cathy O'Neil and Rachel Schutt.

Goals¶

Gather financial data and create and investigate potential features for modeling use cases. Compute daily log returns and log Volume data. Generate a volatility index that looks at an exponentially weighted window function of the variance. Generate simple linear forecasting models based on these features. Compare a strategy using these models with a random model and a buy and hold model.

Generate linear models using recent log returns, volume and volatility estimates.
Look for correlations with volatility and intraday high and low ticker based features from today with tomorrow's close.
Apply the model to future data and calculate returns on investment.
Compare the model returns to simply buying and holding the stock as well as stochastically buying and selling the stock.

Conclusion¶

The goal of this project was to code simple predictive models and compare returns over some time period with using "naive models", e.g. buy and hold. In this notebook I use data for AAPL stock from 1995 to 2018. I generate a model trained from data between 1995 and 2015 and apply this model to the years 2015, 2016 and 2017. It is not surprising to me that I did not find any significant correlation with the model and next day stock returns. The models do not show any predictive tendencies relative to a stochastic model that randomly chose to buy and sell on any given day or just buying and holding.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import pandas_datareader

## Get some financial data
## Quandl - is a private a financial data provider.    

start_date = '1990-01-01'
end_date = '2018-11-12'

# Quandl Key, add here:
key="YOURKEYHERE"
# this is the sp500 - just the daily value
sp_code="MULTPL/SP500_REAL_PRICE_MONTH"


## To start = let's use an index
#q1 = pandas_datareader.quandl.QuandlReader("AMZN",  start_date, end_date, api_key=key)
#q2 = pandas_datareader.quandl.QuandlReader("GOOGL",  start_date, end_date, api_key=key)
q3 = pandas_datareader.quandl.QuandlReader("AAPL",  start_date, end_date, api_key=key)
#q4 = pandas_datareader.quandl.QuandlReader("FB",  start_date, end_date, api_key=key)
#q = pandas_datareader.quandl.QuandlReader(sp_code,  start_date, end_date, api_key=key)
#amzn=q1.read()
#goog=q2.read()
aapl=q3.read()

# Note - data stops around 3/27/2018
# In previous notebooks I've looked at this data in more depth, here we'll take a peak, 
# and then do a series of transformations before we look at modeling results

#print(amzn.shape, amzn.index.min(),amzn.index.max())
#print(goog.shape, goog.index.min(),goog.index.max())
print(aapl.shape, aapl.index.min(),aapl.index.max())

((7113, 12), Timestamp('1990-01-02 00:00:00'), Timestamp('2018-03-27 00:00:00'))

# Aapl
aapl.head()

## Ok - a while ago the adjusted close values are pretty low!  
aapl[(aapl.index>'1997') & (aapl.index<'1999') ].AdjClose.plot()

<matplotlib.axes._subplots.AxesSubplot at 0x10b49ee90>

# Oh right - there was a bit split a few years back
aapl[['AdjClose','Close']].plot()

<matplotlib.axes._subplots.AxesSubplot at 0x10b49e210>

Transormations on the data¶

First transform the data so that it is feasible to generate linear models with y = Bx + C and find B and C that are the best fit to the data

Reorder so that oldest is at the top.
Add Log Returns (log(AdjClose_t1) - Log(AdjClose_t0)), and normalized Log Returns . (LR, NLR)
Add Volume Returns (log(AdjVolume_t1) - Log(AdjVolume_t0))
Add Interday Volatility (AdjHigh - AdjLow)/AdjClose

# Transformations we'll do:  
# 
aapl=aapl.reindex(index=aapl.index[::-1])
aapl['LR']=aapl[['AdjClose']].apply(lambda x: np.log(x) - np.log(x.shift(1)))
aapl['NLR']=(aapl.LR - aapl.LR.mean())/aapl.LR.std()
aapl['vol']=aapl.LR.ewm(halflife=0.97).std()
aapl['LV']=aapl[['AdjVolume']].apply(lambda x: np.log(x) - np.log(x.shift(1)))
aapl['IV']=(aapl.AdjHigh - aapl.AdjLow)/aapl.AdjClose
aapl['IC']=(aapl.AdjClose - aapl.AdjLow)/(aapl.AdjHigh - aapl.AdjLow)

aapl[['LR','NLR','vol','LV','IV','IC']].head()

# Look at the log return and Inter Close
aapl[(aapl.index > '2002') & (aapl.index < '2003')][['NLR','IC']].plot(figsize=(12,6))

<matplotlib.axes._subplots.AxesSubplot at 0x10c0a9d90>

# Are there any interesting correlations?
aapl[(aapl.index > '2002') & (aapl.index < '2003')][['NLR','IC']].corr()

## Look at the volatility and the Interday Volatility
aapl[(aapl.index > '2004') & (aapl.index < '2005')][['vol','IV']].plot(figsize=(12,6))

<matplotlib.axes._subplots.AxesSubplot at 0x115e5cf50>

## are the vols correlated?
aapl[(aapl.index > '2004') & (aapl.index < '2005')][['vol','IV']].corr()

# Ok, now add previous days data:
# This will add the LR-1 as previous NormedLR, making it available for a y = Ax + B type linear problem.
# Since data is ordered old => new, a positive shift, shifts the index "upwards" and gets older data
aapl['NLR-1']=aapl[['NLR']].apply(lambda x: x.shift(1))
aapl['NLR-2']=aapl[['NLR']].apply(lambda x: x.shift(2))
aapl['NLR-3']=aapl[['NLR']].apply(lambda x: x.shift(3))
aapl['NLR-4']=aapl[['NLR']].apply(lambda x: x.shift(4))

# And the future value - what the model is to predict, y = NLR+1
aapl['NLR+1']=aapl[['NLR']].apply(lambda x: x.shift(-1))

# Peak at this:
aapl[['LR', 'AdjClose','NLR+1','NLR','NLR-1','NLR-2','NLR-3','NLR-4','vol','LV','AdjVolume','IV','IC']].head(6)

aapl[['LR', 'AdjClose','NLR+1','NLR','NLR-1','NLR-2','NLR-3','NLR-4','vol','LV','IV','IC']].describe().transpose()

## The modeling process could be as simple as seeing if there's any correlation with NLR+1 with any variable?
aapl[(aapl.index > '2002') & (aapl.index < '2003')][['NLR+1', 'AdjClose','NLR+1','NLR','NLR-1','NLR-2','NLR-3','NLR-4','vol','LV','IV','IC']].corr()

Testing and Applying Models - Part 1¶

Now that we've prepared historic data we'd like to test and apply a predictive model and strategy for trading securities. Here is my initial work that explores the code used to create the different model returns.

Does a model trained in the past predict closes in the future?
(Also, does a model trained in the future, predict closes in the past?)
Does a model beat just holding the stock?
Does a model beat a simple stochastic one??

## How do I see how much just holding AAPL would do?
# Let's say I make a bet of 1000 shares.  What would happen?

hold = aapl[(aapl.index > '2002') & (aapl.index < '2003')][['AdjClose','LR']] 

# Calculate the daily change by looking at the Return from the previous day, and multiplying by previous days price
hold['daily_change']=1000*(np.exp(hold['LR'])-1)*hold['AdjClose'].shift(1)
hold['investment_value']=1000*hold.AdjClose

initial_investment=1000*hold[hold.index==hold.index.min()].AdjClose[0]

# This works - but is hella ugly
#hold['overall_return']=np.ones(hold.count()[0])*initial_investment[0]*hold.AdjClose

hold['overall_return']=1000*hold.AdjClose - initial_investment

# calculate daily change, by looking at overall_return_1 - overall_return_0
hold['daily_change2']=hold[['overall_return']].apply(lambda x: x - x.shift(1))
hold['overall_return2']=hold.daily_change2.cumsum()

# checksum:
hold['chksm1']=hold['overall_return2']-hold['overall_return']
hold.chksm1.sum()

0.0

del hold['chksm1']
hold.head()

hold.tail()

# Using the plt.figure()
hold['investment_value'].plot(grid=True, legend=True)
hold['daily_change'].plot(grid=True, legend=True, secondary_y=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1063bb050>

## Ok, now lets make a stochastic model, (perhaps a few.. ) and see how how it does..
# Let's say bet on 1000 shares.  And always sell on the next day after a bet.  What would happen?
# Also - what would the range of outcomes be, etc..

# first, what is the approx number of days that returns are positive in the data?   close to 50% over all..
print(aapl[aapl.index<'2002'].count()[0])
print(aapl[(aapl.index<'2002') & (aapl.LR > 0)].count()[0])
print(np.float(aapl[(aapl.index<'2002') & (aapl.LR > 0)].count()[0]) / aapl[aapl.index<'2002'].count()[0])

3028
1425
0.470607661823

# First cut a simple stochastic model

rand = aapl[(aapl.index > '2002') & (aapl.index < '2003')][['AdjClose','LR']] 
rand['make_bet']=np.random.randint(0,2, rand.shape[0])
print("number of days: ",rand.shape[0]," bets: ",rand.make_bet.sum())
# Calculate the daily change by looking at the Return from the previous day, and multiplying by previous days price


#rand['daily_change']=1000*(np.exp(hold['LR'])-1)*hold['AdjClose'].shift(1)
rand['daily_change']=1000*rand[['AdjClose']].apply(lambda x: x - x.shift(1))
rand['investment_change']=rand['daily_change']*rand['make_bet']
rand['investment_return']=rand['investment_change'].cumsum()

('number of days: ', 252, ' bets: ', 116)

rand.head(10)

hold.head()

# Compare the holding vs random buying and selling
ax = hold['overall_return'].plot(grid=True, legend=True)
rand['investment_return'].plot(grid=True, legend=True)
hold['daily_change'].plot(grid=True, legend=True, secondary_y=True)
ax.set_ylabel('Cumulative Return [$]')
ax.right_ax.set_ylabel('Daily Price Change [$]')
ax.set_xlabel('Year')

Text(0.5,0,u'Year')

Testing and Applying Models - Part 2¶

Here I train a simple linear model apply it to future data. I also compare to the "buy and hold" and "stochastic" models and generate explanatory statistics and plots.

Goals¶

generate the return comparison graphs for 3 different years, 15', 16', 17'
generate error bars on the stochastic model
does the stochastic model tend to the buy and hold model in the mean?

Note that the since the stochastic model chooses to bet ~ half the time, the return is roughly half of the buy and hold strategy.

# Simple model -
# Train a simple model with AAPL on say 1998 to 2012, then test the model for each year post 2012

X_train = aapl[(aapl.index>'1996') & (aapl.index<'2015')][['NLR','NLR-1','NLR-2','NLR-3','vol','LV','IV','IC']]
Y_train = aapl[(aapl.index>'1996') & (aapl.index<'2015')][['NLR+1']]

# Lets test the model on various years:  (Select out 15, 16, 17 for plots later)

X = aapl[(aapl.index>'2015') & (aapl.index<'2018')][['NLR','NLR-1','NLR-2','NLR-3','vol','LV','IV','IC']]
Y = aapl[(aapl.index>'2015') & (aapl.index<'2018')][['NLR+1']]

X15 = aapl[(aapl.index>'2015') & (aapl.index<'2016')][['NLR','NLR-1','NLR-2','NLR-3','vol','LV','IV','IC']]
Y15 = aapl[(aapl.index>'2015') & (aapl.index<'2016')][['NLR+1']]

X16 = aapl[(aapl.index>'2016') & (aapl.index<'2017')][['NLR','NLR-1','NLR-2','NLR-3','vol','LV','IV','IC']]
Y16 = aapl[(aapl.index>'2016') & (aapl.index<'2017')][['NLR+1']]

X17 = aapl[(aapl.index>'2017') & (aapl.index<'2018')][['NLR','NLR-1','NLR-2','NLR-3','vol','LV','IV','IC']]
Y17 = aapl[(aapl.index>'2017') & (aapl.index<'2018')][['NLR+1']]

X_train.describe().transpose()

# Creat the model
X_train.head()

Y_train.describe().transpose()

Y_train.head()

from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso

from sklearn.metrics import mean_squared_error

# Actual model - lasso model
regressor = Ridge(alpha=0.5)
regressor.fit(X_train, Y_train)

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

r2 = Lasso(alpha=0.0001)
r2.fit(X_train, Y_train)

Lasso(alpha=0.0001, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

# Predictions for each of 2015, 2016, 2017
Y_predict = regressor.predict(X)

Y15_predict = regressor.predict(X15)
Y16_predict = regressor.predict(X16)
Y17_predict = regressor.predict(X17)

# Predictions for each of 2015, 2016, 2017 using the Lasso model
Y_predict2 = r2.predict(X)

Y15_predict2 = r2.predict(X15)
Y16_predict2 = r2.predict(X16)
Y17_predict2 = r2.predict(X17)

mean_squared_error(Y_predict, Y)

0.2550722671112245

mean_squared_error(Y_predict2, Y)

0.25551893583106344

regressor.coef_

array([[ 0.00487861, -0.00553775, -0.01053907,  0.0484403 , -2.53586982,
         0.00515271,  0.77611617, -0.18425681]])

r2.coef_

array([ 3.06115900e-03, -6.00148672e-03, -1.09980476e-02,  4.76754539e-02,
       -3.29158265e+00,  0.00000000e+00,  1.16650863e+00, -1.79054810e-01])

What's in the modeled data¶

Column Name	Model	Details
daily_change	None	Difference in stock price between today and yesterday
hold_current_value	Hold	Today's AdjClose * Number of shares purchased, Hold investment value
hold_current_return	Hold	Current Return, today's investment - initial investment
rand_make_bet	Random	boolean on whether to buy on that day
rand_daily_change	Random	If a bet is made today, tomorrow will reflect that bet as (amount * daily_change)
predict	Predictive	Output of predictive model (Real number)
predict_make_bet	Predictive	Boolean output of predictive model (1 if predict > 0)
predict_daily_change	Predictive	If a bet is made today, tomorrow will reflect that bet as (amount * daily_change)

## Ok - gather the data for 2015


data_15 = aapl[(aapl.index > '2015') & (aapl.index < '2016')][['AdjClose','LR']] 

## Random model - randomly buy and sell stock, if this is a 1, buy (and sell the next day)
data_15['rand_make_bet']=np.random.randint(0,2, data_15.shape[0])
print("Number of days: {0}".format(data_15.shape[0]))
print("Number of random bets: {0}".format(data_15.rand_make_bet.sum()))

## Recall - LR = Log(C_t/C_(t-1)))
data_15['daily_change']=(np.exp(data_15['LR'])-1)*data_15['AdjClose'].shift(1)

## Static model - buy and hold
data_15['hold_current_value']=1000*data_15.AdjClose

initial_investment15=1000*data_15[data_15.index==data_15.index.min()].AdjClose[0]
print("initial investment: {0:8.2f}".format(initial_investment15))

# This is the current Investment amount
data_15['hold_current_return']=1000*data_15.AdjClose - initial_investment15

roi_hold_net15=data_15.hold_current_return[-1:][0]

print("Final hold net return: {0:8.2f}".format(roi_hold_net15))

#rand['daily_change']=1000*(np.exp(hold['LR'])-1)*hold['AdjClose'].shift(1)
#data_15['rand_daily_change']=1000*data_15[['AdjClose']].apply(lambda x: x - x.shift(1))
data_15['rand_daily_change']=1000*data_15['daily_change']*data_15['rand_make_bet'].shift(1)
#data_15['rand_current_return']=data_15['rand_daily_change'].cumsum()

data_15['predict']=Y15_predict
data_15['predict_make_bet']=np.where(data_15['predict']>0, 1, 0)
print("Number of model bets: {0}".format(data_15.predict_make_bet.sum()))
data_15['predict_daily_change']=1000*data_15['daily_change']*data_15['predict_make_bet'].shift(1)
#data_15['predict_current_return']=data_15['predict_daily_change'].cumsum()

data_15['predict2']=Y15_predict2
data_15['predict_make_bet2']=np.where(data_15['predict2']>0, 1, 0)
print("Number of model bets: {0}".format(data_15.predict_make_bet2.sum()))
data_15['predict_daily_change2']=1000*data_15['daily_change']*data_15['predict_make_bet2'].shift(1)
#data_15['predict_current_return']=data_15['predict_daily_change'].cumsum()


# Calculate my return:  
print("\nBuy and Hold Results")
print("Net gain: {0:8.2f}".format(roi_hold_net15))
prct_roi_hold15= 100*roi_hold_net15/initial_investment15
print("Percent gain: {0:2.2f} %".format(prct_roi_hold15))

print("Random Results:")
roi_rand15=data_15['rand_daily_change'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_rand15))
prct_roi_rand15=100*(roi_rand15 ) / initial_investment15
print("Percent gain: {0:2.2f} %".format(prct_roi_rand15))

print("Model Results:")
roi_model15=data_15['predict_daily_change'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_model15))
prct_roi_model15=100*(roi_model15 ) / initial_investment15
print("Percent gain: {0:2.2f} %".format(prct_roi_model15))

print("Model Results  (Lasso, alpha=.001):")
roi_model15_2=data_15['predict_daily_change2'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_model15_2))
prct_roi_model15_2=100*(roi_model15_2 ) / initial_investment15
print("Percent gain: {0:2.2f} %".format(prct_roi_model15_2))

Number of days: 252
Number of random bets: 127
initial investment: 103863.96
Final hold net return: -2167.15
Number of model bets: 158
Number of model bets: 162

Buy and Hold Results
Net gain: -2167.15
Percent gain: -2.09 %
Random Results:
Net gain: -1403.68
Percent gain: -1.35 %
Model Results:
Net gain: -10773.05
Percent gain: -10.37 %
Model Results  (Lasso, alpha=.001):
Net gain: -16071.66
Percent gain: -15.47 %

# Ok - gather the data for 2016

data_16 = aapl[(aapl.index > '2016') & (aapl.index < '2017')][['AdjClose','LR']] 

## Random model - randomly buy and sell stock
data_16['rand_make_bet']=np.random.randint(0,2, data_16.shape[0])
print("Number of days: {0}".format(data_16.shape[0]))
print("Number of random bets: {0}".format(data_16.rand_make_bet.sum()))

## Recall - LR = Log(C_t/C_(t-1)))
data_16['daily_change']=(np.exp(data_16['LR'])-1)*data_16['AdjClose'].shift(1)

## Static model - buy and hgold
data_16['hold_current_value']=1000*data_16.AdjClose

initial_investment16=1000*data_16[data_16.index==data_16.index.min()].AdjClose[0]
print("initial investment: {0:8.2f}".format(initial_investment16))

data_16['hold_current_return']=1000*data_16.AdjClose - initial_investment16

roi_hold_net16=data_16.hold_current_return[-1:][0]
print("Final hold net return: {0:8.2f}".format(roi_hold_net16))


data_16['rand_daily_change']=1000*data_16['daily_change']*data_16['rand_make_bet'].shift(1)
data_16['predict']=Y16_predict
data_16['predict_make_bet']=np.where(data_16['predict']>0, 1, 0)
print("Number of model bets: {0}".format(data_16.predict_make_bet.sum()))
data_16['predict_daily_change']=1000*data_16['daily_change']*data_16['predict_make_bet'].shift(1)

data_16['predict2']=Y16_predict2
data_16['predict_make_bet2']=np.where(data_16['predict2']>0, 1, 0)
print("Number of model bets: {0}".format(data_16.predict_make_bet2.sum()))
data_16['predict_daily_change2']=1000*data_16['daily_change']*data_16['predict_make_bet2'].shift(1)
#data_15['predict_current_return']=data_15['predict_daily_change'].cumsum()


# Calculate my return:  
print("Buy and Hold Results")
print("Net gain: {0:8.2f}".format(roi_hold_net16))
prct_roi_hold16= 100*roi_hold_net16/initial_investment16
print("Percent gain: {0:2.2f} %".format(prct_roi_hold16))

print("Random Results:")
roi_rand16=data_16['rand_daily_change'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_rand16))
prct_roi_rand16=100*(roi_rand16 ) / initial_investment16
print("Percent gain: {0:2.2f} %".format(prct_roi_rand16))

print("Model Results:")
roi_model16=data_16['predict_daily_change'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_model16))
prct_roi_model16=100*(roi_model16 ) / initial_investment16
print("Percent gain: {0:2.2f} %".format(prct_roi_model16))

print("Model Results  (Lasso, alpha=.001):")
roi_model16_2=data_16['predict_daily_change2'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_model16_2))
prct_roi_model16_2=100*(roi_model16_2 ) / initial_investment16
print("Percent gain: {0:2.2f} %".format(prct_roi_model16_2))

Number of days: 252
Number of random bets: 128
initial investment: 101783.76
Final hold net return: 12605.69
Number of model bets: 161
Number of model bets: 167
Buy and Hold Results
Net gain: 12605.69
Percent gain: 12.38 %
Random Results:
Net gain: 27292.94
Percent gain: 26.81 %
Model Results:
Net gain: 19246.17
Percent gain: 18.91 %
Model Results  (Lasso, alpha=.001):
Net gain: 15901.36
Percent gain: 15.62 %

# Ok - gather the data for 2017

data_17 = aapl[(aapl.index > '2017') & (aapl.index < '2018')][['AdjClose','LR']] 

## Random model - randomly buy and sell stock
data_17['rand_make_bet']=np.random.randint(0,2, data_17.shape[0])
print("Number of days: {0}".format(data_17.shape[0]))
print("Number of random bets: {0}".format(data_17.rand_make_bet.sum()))

## Recall - LR = Log(C_t/C_(t-1)))
data_17['daily_change']=(np.exp(data_17['LR'])-1)*data_17['AdjClose'].shift(1)

## Static model - buy and hgold
data_17['hold_current_value']=1000*data_17.AdjClose

initial_investment17=1000*data_17[data_17.index==data_17.index.min()].AdjClose[0]
print("initial investment: {0:8.2f}".format(initial_investment17))

data_17['hold_current_return']=1000*data_17.AdjClose - initial_investment17

roi_hold_net17=data_17.hold_current_return[-1:][0]
print("Final hold net return: {0:8.2f}".format(roi_hold_net17))


data_17['rand_daily_change']=1000*data_17['daily_change']*data_17['rand_make_bet'].shift(1)
data_17['predict']=Y17_predict
data_17['predict_make_bet']=np.where(data_17['predict']>0, 1, 0)
print("Number of model bets: {0}".format(data_17.predict_make_bet.sum()))
data_17['predict_daily_change']=1000*data_17['daily_change']*data_17['predict_make_bet'].shift(1)

data_17['predict2']=Y17_predict2
data_17['predict_make_bet2']=np.where(data_17['predict2']>0, 1, 0)
print("Number of model bets: {0}".format(data_17.predict_make_bet2.sum()))
data_17['predict_daily_change2']=1000*data_17['daily_change']*data_17['predict_make_bet2'].shift(1)

# Calculate my return:  
print("Buy and Hold Results 2017")
print("Net gain: {0:8.2f}".format(roi_hold_net17))
prct_roi_hold17= 100*roi_hold_net17/initial_investment17
print("Percent gain: {0:2.2f} %".format(prct_roi_hold17))

print("Random Results:")
roi_rand17=data_17['rand_daily_change'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_rand17))
prct_roi_rand17=100*(roi_rand17 ) / initial_investment17
print("Percent gain: {0:2.2f} %".format(prct_roi_rand17))

print("Model Results:")
roi_model17=data_17['predict_daily_change'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_model17))
prct_roi_model17=100*(roi_model17 ) / initial_investment17
print("Percent gain: {0:2.2f} %".format(prct_roi_model17))

print("Model Results  (Lasso, alpha=.001):")
roi_model17_2=data_17['predict_daily_change2'].cumsum()[-1:][0]
print("Net gain: {0:8.2f}".format(roi_model17_2))
prct_roi_model17_2=100*(roi_model17_2 ) / initial_investment17
print("Percent gain: {0:2.2f} %".format(prct_roi_model17_2))

Number of days: 249
Number of random bets: 126
initial investment: 114715.38
Final hold net return: 54514.62
Number of model bets: 158
Number of model bets: 161
Buy and Hold Results 2017
Net gain: 54514.62
Percent gain: 47.52 %
Random Results:
Net gain: 15926.57
Percent gain: 13.88 %
Model Results:
Net gain: 21241.47
Percent gain: 18.52 %
Model Results  (Lasso, alpha=.001):
Net gain: 22677.20
Percent gain: 19.77 %

year=2015
"Investment Return on {0} in {1}".format(initial_investment, year)

'Investment Return on 114715.377802 in 2015'

# 2015 just loses, based on the last day, -2%
data_15.AdjClose.plot(legend=True,grid=True,title="AAPL Adjusted Close, 2015",figsize=(10,5))

<matplotlib.axes._subplots.AxesSubplot at 0x1a18630490>

# 2016 is a big win, 12%
data_16.AdjClose.plot(legend=True,grid=True,title="AAPL Adjusted Close, 2016",figsize=(10,5))

<matplotlib.axes._subplots.AxesSubplot at 0x10c0bfcd0>

# 2017 is a huge win, 46%
data_17.AdjClose.plot(legend=True,grid=True,title="AAPL Adjusted Close, 2017",figsize=(10,5))

<matplotlib.axes._subplots.AxesSubplot at 0x10c24c890>

## Plot the various returns - first 2015:
# Hold
# Random
# Applying the model
year=2015
ax = data_15['hold_current_return'].plot(grid=True, legend=True, figsize=(8,6), 
    title="Investment Return on {0:9.2f}$ in {1}".format(initial_investment15, year))
data_15['rand_daily_change'].cumsum().plot(grid=True, legend=True)
data_15['predict_daily_change'].cumsum().plot(grid=True, legend=True)
data_15['predict_daily_change2'].cumsum().plot(grid=True, legend=True)
# data_15['daily_change'].plot(grid=True, legend=True, secondary_y=True)
ax.set_ylabel('Cumulative Return [$]')
# ax.right_ax.set_ylabel('Daily Investment Flu [$]')
ax.set_xlabel('Year')

Text(0.5,0,u'Year')

## Plot the various returns:
# Hold
# Random
# Applying the model
year = 2016
ax = data_16['hold_current_return'].plot(grid=True, legend=True, figsize=(8,6), 
    title="Investment Return on {0:9.2f}$ in {1}".format(initial_investment16, year))
data_16['rand_daily_change'].cumsum().plot(grid=True, legend=True)
data_16['predict_daily_change'].cumsum().plot(grid=True, legend=True)
data_16['predict_daily_change2'].cumsum().plot(grid=True, legend=True)
# data_15['daily_change'].plot(grid=True, legend=True, secondary_y=True)
ax.set_ylabel('Cumulative Return [$]')
# ax.right_ax.set_ylabel('Daily Investment Flu [$]')
ax.set_xlabel('Year')

Text(0.5,0,u'Year')

## Plot the various returns:
# Hold
# Random
# Applying the model
year=2017
ax = data_17['hold_current_return'].plot(grid=True, legend=True, figsize=(8,6), 
    title="Investment Return on {0:9.2f}$ in {1}".format(initial_investment17, year))
data_17['rand_daily_change'].cumsum().plot(grid=True, legend=True)
data_17['predict_daily_change'].cumsum().plot(grid=True, legend=True)
data_17['predict_daily_change2'].cumsum().plot(grid=True, legend=True)
# data_15['daily_change'].plot(grid=True, legend=True, secondary_y=True)
ax.set_ylabel('Cumulative Return [$]')
# ax.right_ax.set_ylabel('Daily Investment Flu [$]')
ax.set_xlabel('Year')

Text(0.5,0,u'Year')

## Ok - so these models are pretty crappy.  
# what is the range of outcomes for the random models?  And what does the distribution look like?
# 1. Simulate returns on (say 1000) random models
# 2. What's the mean and std of these outcomes?


# The static data
data_test15 = aapl[(aapl.index > '2015') & (aapl.index < '2016')][['AdjClose','LR']] 
data_test15['daily_change']=(np.exp(data_test15['LR'])-1)*data_test15['AdjClose'].shift(1)
results15 = np.zeros(1000)

# The stochastic model
for i in range(1000):
    data_test15['rand_make_bet']=np.random.randint(0,2, data_test15.shape[0])
    data_test15['rand_daily_change']=1000*data_test15['daily_change']*data_test15['rand_make_bet'].shift(1)
    results15[i]=data_test15['rand_daily_change'].cumsum()[-1:][0]

print("2015 Model Results: {0:8.2f}".format(results15.mean()))
print("2015 Model Results: {0:8.2f}%".format(100*(results15.mean()/initial_investment15)))
print("2015 Model Std: {0:8.2f}".format(results15.std()))

2015 Model Results:  -826.85
2015 Model Results:    -0.80%
2015 Model Std: 15080.42

# Look at 2016
# The static data
data_test16 = aapl[(aapl.index > '2016') & (aapl.index < '2017')][['AdjClose','LR']] 
data_test16['daily_change']=(np.exp(data_test16['LR'])-1)*data_test16['AdjClose'].shift(1)
results16 = np.zeros(1000)

# The stochastic model
for i in range(1000):
    data_test16['rand_make_bet']=np.random.randint(0,2, data_test16.shape[0])
    data_test16['rand_daily_change']=1000*data_test16['daily_change']*data_test16['rand_make_bet'].shift(1)
    results16[i]=data_test16['rand_daily_change'].cumsum()[-1:][0]
    
print("2016 Model Results: {0:8.2f}".format(results16.mean()))
print("2016 Model Results: {0:8.2f}%".format(100*(results16.mean()/initial_investment16)))
print("2016 Model Std: {0:8.2f}".format(results16.std()))

2016 Model Results:  6139.92
2016 Model Results:     6.03%
2016 Model Std: 12074.59

# Look at 2017
# The static data
data_test17 = aapl[(aapl.index > '2017') & (aapl.index < '2018')][['AdjClose','LR']] 
data_test17['daily_change']=(np.exp(data_test17['LR'])-1)*data_test17['AdjClose'].shift(1)
results17 = np.zeros(1000)

# The stochastic model
for i in range(1000):
    data_test17['rand_make_bet']=np.random.randint(0,2, data_test17.shape[0])
    data_test17['rand_daily_change']=1000*data_test17['daily_change']*data_test17['rand_make_bet'].shift(1)
    results17[i]=data_test17['rand_daily_change'].cumsum()[-1:][0]
    
print("2017 Model Results: {0:8.2f}".format(results17.mean()))
print("2017 Model Results: {0:8.2f}%".format(100*(results17.mean()/initial_investment17)))
print("2017 Model Std: {0:8.2f}".format(results17.std()))

2017 Model Results: 27199.69
2017 Model Results:    23.71%
2017 Model Std: 13271.84

Model Results¶

If you look closer at my model results from 2015. There was a net gain of -14723.44 and a percent gain of -14.18. These results are near the left side of the distribution of normal results from the stochastic model.

# Quick and dirty histogram of the model outcomes
plt.hist(results15, color = 'blue', edgecolor = 'black', bins=25)

(array([  1.,   3.,   3.,   2.,  12.,  18.,  29.,  50.,  48.,  91.,  70.,
         92.,  89., 124.,  86.,  82.,  60.,  50.,  36.,  25.,  15.,   7.,
          3.,   2.,   2.]),
 array([-50264.06680961, -46379.16524088, -42494.26367216, -38609.36210343,
        -34724.4605347 , -30839.55896598, -26954.65739725, -23069.75582852,
        -19184.8542598 , -15299.95269107, -11415.05112234,  -7530.14955361,
         -3645.24798489,    239.65358384,   4124.55515257,   8009.45672129,
         11894.35829002,  15779.25985875,  19664.16142747,  23549.0629962 ,
         27433.96456493,  31318.86613365,  35203.76770238,  39088.66927111,
         42973.57083983,  46858.47240856]),
 <a list of 25 Patch objects>)

# Quick and dirty density plot of the model outcomes
pd.DataFrame(results15).plot.density()

<matplotlib.axes._subplots.AxesSubplot at 0x1a1d7e96d0>

ax = pd.DataFrame({'2015' : results15, '2016' : results16, '2017' : results17}).plot.hist(
    title='Density plot of Stochastic Model Returns', bins=30, alpha=0.5)
ax.set_xlabel('Return on Investment [$]')

Text(0.5,0,u'Return on Investment [$]')

	Open	High	Low	Close	Volume	ExDividend	SplitRatio	AdjOpen	AdjHigh	AdjLow	AdjClose	AdjVolume
Date
2018-03-27	173.68	175.15	166.92	168.340	38962839.0	0.0	1.0	173.68	175.15	166.92	168.340	38962839.0
2018-03-26	168.07	173.10	166.44	172.770	36272617.0	0.0	1.0	168.07	173.10	166.44	172.770	36272617.0
2018-03-23	168.39	169.92	164.94	164.940	40248954.0	0.0	1.0	168.39	169.92	164.94	164.940	40248954.0
2018-03-22	170.00	172.68	168.60	168.845	41051076.0	0.0	1.0	170.00	172.68	168.60	168.845	41051076.0
2018-03-21	175.04	175.09	171.26	171.270	35247358.0	0.0	1.0	175.04	175.09	171.26	171.270	35247358.0

	LR	NLR	vol	LV	IV	IC
Date
1990-01-02	NaN	NaN	NaN	NaN	0.067114	0.900000
1990-01-03	0.006689	0.207857	NaN	0.126945	0.013333	0.000000
1990-01-04	0.003461	0.095719	0.002283	0.062969	0.039862	0.253333
1990-01-05	0.003184	0.086104	0.001576	-0.585766	0.033113	0.600000
1990-01-08	0.006601	0.204789	0.002081	-0.193942	0.026316	1.000000

	LR	AdjClose	NLR+1	NLR	NLR-1	NLR-2	NLR-3	NLR-4	vol	LV	AdjVolume	IV	IC
Date
1990-01-02	NaN	1.118093	0.207857	NaN	NaN	NaN	NaN	NaN	NaN	NaN	45799600.0	0.067114	0.900000
1990-01-03	0.006689	1.125597	0.095719	0.207857	NaN	NaN	NaN	NaN	NaN	0.126945	51998800.0	0.013333	0.000000
1990-01-04	0.003461	1.129499	0.086104	0.095719	0.207857	NaN	NaN	NaN	0.002283	0.062969	55378400.0	0.039862	0.253333
1990-01-05	0.003184	1.133101	0.204789	0.086104	0.095719	0.207857	NaN	NaN	0.001576	-0.585766	30828000.0	0.033113	0.600000
1990-01-08	0.006601	1.140605	-0.364365	0.204789	0.086104	0.095719	0.207857	NaN	0.002081	-0.193942	25393200.0	0.026316	1.000000
1990-01-09	-0.009785	1.129499	-1.562685	-0.364365	0.204789	0.086104	0.095719	0.207857	0.009535	-0.164811	21534800.0	0.026575	0.630000

	count	mean	std	min	25%	50%	75%	max
LR	7112.0	7.050564e-04	0.028789	-0.731247	-0.012852	0.000054	0.014317	0.286796
AdjClose	7113.0	2.828904e+01	43.039528	0.415743	1.229236	3.080479	43.196105	181.720000
NLR+1	7112.0	4.247633e-17	1.000000	-25.424949	-0.470909	-0.022607	0.472828	9.937587
NLR	7112.0	4.247633e-17	1.000000	-25.424949	-0.470909	-0.022607	0.472828	9.937587
NLR-1	7111.0	1.303291e-04	1.000010	-25.424949	-0.470355	-0.022505	0.472882	9.937587
NLR-2	7110.0	-9.279425e-05	0.999903	-25.424949	-0.470618	-0.022607	0.472579	9.937587
NLR-3	7109.0	2.497108e-05	0.999924	-25.424949	-0.469828	-0.022505	0.472775	9.937587
NLR-4	7108.0	9.810736e-05	0.999976	-25.424949	-0.469301	-0.022386	0.472828	9.937587
vol	7111.0	2.297688e-02	0.017499	0.001576	0.012274	0.019062	0.028985	0.470795
LV	7112.0	-2.273158e-05	0.436074	-4.030927	-0.260247	-0.022251	0.236022	4.025519
IV	7113.0	3.302855e-02	0.021041	0.003953	0.018308	0.028265	0.041760	0.277191
IC	7113.0	5.094351e-01	0.307046	0.000000	0.239617	0.500000	0.786885	1.000000

	NLR+1	AdjClose	NLR+1	NLR	NLR-1	NLR-2	NLR-3	NLR-4	vol	LV	IV	IC
NLR+1	1.000000	-0.063285	1.000000	-0.024893	-0.124988	-0.092695	0.089660	-0.018519	0.011842	0.019582	0.080712	-0.101481
AdjClose	-0.063285	1.000000	-0.063285	0.085514	0.081302	0.063937	0.054149	0.069897	0.048722	0.010819	-0.004778	0.126551
NLR+1	1.000000	-0.063285	1.000000	-0.024893	-0.124988	-0.092695	0.089660	-0.018519	0.011842	0.019582	0.080712	-0.101481
NLR	-0.024893	0.085514	-0.024893	1.000000	-0.033522	-0.119694	-0.081344	0.090525	-0.049604	-0.067566	-0.014254	0.696897
NLR-1	-0.124988	0.081302	-0.124988	-0.033522	1.000000	-0.035378	-0.120355	-0.082399	-0.098442	0.017899	-0.061903	0.044624
NLR-2	-0.092695	0.063937	-0.092695	-0.119694	-0.035378	1.000000	-0.032962	-0.119734	-0.051526	0.001735	-0.063011	-0.016333
NLR-3	0.089660	0.054149	0.089660	-0.081344	-0.120355	-0.032962	1.000000	-0.031626	-0.083384	0.062351	-0.018205	-0.054379
NLR-4	-0.018519	0.069897	-0.018519	0.090525	-0.082399	-0.119734	-0.031626	1.000000	-0.066132	0.031008	-0.097918	0.136069
vol	0.011842	0.048722	0.011842	-0.049604	-0.098442	-0.051526	-0.083384	-0.066132	1.000000	0.071383	0.475645	0.037181
LV	0.019582	0.010819	0.019582	-0.067566	0.017899	0.001735	0.062351	0.031008	0.071383	1.000000	0.308876	0.043641
IV	0.080712	-0.004778	0.080712	-0.014254	-0.061903	-0.063011	-0.018205	-0.097918	0.475645	0.308876	1.000000	0.000201
IC	-0.101481	0.126551	-0.101481	0.696897	0.044624	-0.016333	-0.054379	0.136069	0.037181	0.043641	0.000201	1.000000

	AdjClose	LR	daily_change	investment_value	overall_return	daily_change2	overall_return2
Date
2002-01-02	1.497187	0.061967	NaN	1497.187374	0.000000	NaN	NaN
2002-01-03	1.515179	0.011946	17.991951	1515.179325	17.991951	17.991951	17.991951
2002-01-04	1.522248	0.004654	7.068267	1522.247591	25.060218	7.068267	25.060218
2002-01-07	1.471485	-0.033916	-50.763005	1471.484586	-25.702788	-50.763005	-25.702788
2002-01-08	1.452850	-0.012745	-18.634521	1452.850065	-44.337308	-18.634521	-44.337308

	AdjClose	LR	daily_change	investment_value	overall_return	daily_change2	overall_return2
Date
2002-12-24	0.922730	-0.009012	-8.353406	922.730072	-574.457301	-8.353406	-574.457301
2002-12-26	0.925300	0.002782	2.570279	925.300351	-571.887023	2.570279	-571.887023
2002-12-27	0.903453	-0.023894	-21.847369	903.452982	-593.734392	-21.847369	-593.734392
2002-12-30	0.904096	0.000711	0.642570	904.095551	-593.091822	0.642570	-593.091822
2002-12-31	0.920802	0.018310	16.706812	920.802363	-576.385010	16.706812	-576.385010

	count	mean	std	min	25%	50%	75%	max
NLR	4784.0	0.009120	1.072609	-25.424949	-0.494113	-0.003268	0.509806	9.937587
NLR-1	4784.0	0.009229	1.072565	-25.424949	-0.493764	-0.003268	0.509806	9.937587
NLR-2	4784.0	0.009233	1.072563	-25.424949	-0.493764	-0.003268	0.509806	9.937587
NLR-3	4784.0	0.009310	1.072572	-25.424949	-0.493764	-0.002572	0.509806	9.937587
vol	4784.0	0.024499	0.019071	0.001967	0.013005	0.020015	0.030851	0.470795
LV	4784.0	-0.000127	0.418232	-1.820601	-0.250602	-0.026271	0.224533	2.830664
IV	4784.0	0.034869	0.022401	0.004150	0.019452	0.029125	0.044096	0.277191
IC	4784.0	0.509037	0.310176	0.000000	0.231474	0.508292	0.788937	1.000000

	NLR	NLR-1	NLR-2	NLR-3	vol	LV	IV	IC
Date
1996-01-02	0.257739	-0.165892	-0.434549	0.320498	0.010242	-0.780885	0.015562	0.760000
1996-01-03	-0.024491	0.257739	-0.165892	-0.434549	0.007258	1.126808	0.031435	0.257426
1996-01-04	-0.646251	-0.024491	0.257739	-0.165892	0.012666	-0.359008	0.032003	0.188119
1996-01-05	2.816762	-0.646251	-0.024491	0.257739	0.056497	0.395767	0.084088	1.000000
1996-01-08	0.358777	2.816762	-0.646251	-0.024491	0.042763	-1.301554	0.043315	0.420000

	AdjClose	LR	make_bet	daily_change	investment_change	investment_return
Date
2002-01-02	1.497187	0.061967	0	NaN	NaN	NaN
2002-01-03	1.515179	0.011946	1	17.991951	17.991951	17.991951
2002-01-04	1.522248	0.004654	1	7.068267	7.068267	25.060218
2002-01-07	1.471485	-0.033916	0	-50.763005	-0.000000	25.060218
2002-01-08	1.452850	-0.012745	1	-18.634521	-18.634521	6.425697
2002-01-09	1.391163	-0.043387	0	-61.686690	-0.000000	6.425697
2002-01-10	1.364175	-0.019590	0	-26.987927	-0.000000	6.425697
2002-01-11	1.352609	-0.008515	0	-11.566254	-0.000000	6.425697
2002-01-14	1.359035	0.004739	1	6.425697	6.425697	12.851394
2002-01-15	1.394376	0.025672	1	35.341333	35.341333	48.192727