As an intern here at Deephaven, I’ve been documenting my journey as I learn to use Deephaven Community Core. So far, I’ve explored the basics with a simple analysis project. Next, I’ll put my knowledge to the test using Deephaven to analyze a real-world event.
I’ve been working on a project analyzing a very big financial topic right now, MicroStrategy’s purchase of Bitcoin. If you haven’t heard, a company named MicroStrategy took out a $205 million Bitcoin-backed loan. They intended to purchase a large amount of additional Bitcoin to protect their earnings from inflation. In my project, I wanted to analyze the relationship between several companies’ stock prices and the price of Bitcoin. As part of this analysis, I explored using Bitcoin as a predictor of MicroStrategy’s stock price.
Accurate predictions can be used to guide your decisions and to gain an understanding of the relationships between different variables. During my analysis, I found there was a linear relationship between the securities, so I decided to use a very common prediction method, linear regression, to see if I could create an accurate price prediction. Using Deephaven and NumPy, I created this linear regression based on historical data. I then used this regression to create a historical prediction plot for Microstrategy.
This part of my project has opened my eyes to the number of tools Deephaven and NumPy provide, which was very helpful in my journey. This duo made my data analysis quick and easy, and there’s plenty of documentation to help with more advanced use cases. While this is only a static example, the same code can be used on real-time data, so I can seamlessly transition to working with real-time data on future projects.
So how did I end up making that graph?
To begin predicting stock prices, I first pulled some historical stock price data from Yahoo Finance. After doing some quick cleaning up of the code, I created a table that had both Bitcoin’s and Microstrategy’s daily prices.
from deephaven import read_csv
bitcoin = read_csv("https://query1.finance.yahoo.com/v7/finance/download/BTC-USD?period1=1592179200&period2=1653523200&interval=1d&events=history&includeAdjustedClose=true")
.update(["Date = convertDateTime(Date + `T00:00:00 NY`)", "BTCLogPrice = log(High)"])
.rename_columns(["BTCPrice = High", "Timestamp = Date"])
.select(formulas=["Timestamp", "BTCPrice", "BTCLogPrice"])
microstrategy = read_csv("https://query1.finance.yahoo.com/v7/finance/download/MSTR?period1=1592179200&period2=1653523200&interval=1d&events=history&includeAdjustedClose=true")
.update(formulas=["Date = convertDateTime(Date + `T00:00:00 NY`)", "MSTRLogPrice = log(High)"])
.rename_columns(["MSTRPrice = High", "Timestamp = Date"])
.select(formulas=["Timestamp", "MSTRPrice", "MSTRLogPrice"])
mstr_btc = microstrategy
.join(table=bitcoin, on=["Timestamp"])
Now I could start creating my linear regression. Using NumPy’s polynomial package, it is easy to define the linear regression calculation. A quick search on NumPy’s website brings you to the polyfit documentation, which gives all the code necessary to calculate any regression. The only complication was finding the coefficient of determination, which helps judge how accurate any regression is. Some simple math, and I was good to go:
import numpy as np
import numpy.polynomial.polynomial as poly
def calc_reg(x,y):
x = np.array(x)
y = np.array(y)
reg, stats = poly.polyfit(x,y, 1, full=True)
m = reg[1]
c = reg[0]
SSR = stats[0][0]
diff = y - y.mean()
square_diff = diff ** 2
SST = square_diff.sum()
R2 = 1- SSR/SST
return (m, c, R2)
get_val = lambda rst, i: rst[i]
mstr_btc_with_reg = mstr_btc
.group_by()
.update(formulas=["Reg = calc_reg(array(BTCLogPrice), array(MSTRLogPrice))", "Beta = (double) get_val(Reg,0)", "Intercept = (double) get_val(Reg,1)", "R2 = (double) get_val(Reg,2)"])
.drop_columns(cols=["Reg"])
.ungroup()
.update("MSTRLogPred = Beta * BTCLogPrice + Intercept")
.move_columns(idx = 7, cols = "MSTRLogPred")
With my new linear regression, the only thing that was left to do was to see it in action. I created a graph that showed both the predicted price from the regression and the actual price.
from deephaven.plot.figure import Figure
from deephaven.pandas import to_pandas
reg_MSTR = to_pandas(mstr_btc_with_reg.first_by())
mstr_prediction_plot = Figure()
.plot_xy(series_name="Actual", t=mstr_btc_with_reg, x="Timestamp", y="MSTRLogPrice")
.plot_xy(series_name="Predicted", t=mstr_btc_with_reg, x="Timestamp", y="MSTRLogPred")
.chart_title(title=f"R2 = {reg_MSTR['R2'][0]}, Beta = {reg_MSTR['Beta'][0]}, Intercept = {reg_MSTR['Intercept'][0]}")
.show()
Microstrategy’s decision to buy such a large amount of Bitcoin has caused them to inherit a lot of volatility, which raises many concerns about the long-term stability of this plan. The uncertain price of Bitcoin holds a major influence on the company’s assets, and time will only tell if it was a smart decision. Relationships like this are important to create accurate predictions, and plots are very helpful in discovering these relationships. Being able to leverage Deephaven and its ability to integrate with many different libraries like NumPy made this project very easy. Plus, Deephaven can take on both static and real-time data, which makes its abilities even more convenient. If accurate predictions are what you’re looking for, Deephaven has the right tools for the job.
Don’t believe me? Be sure to check out our examples or reach out to us on Slack to learn more!
Source link
lol