Forecasting Stock Price With ARIMA-ARCH Model
This article provides a step-by-step procedure to demonstrate how to fit financial data to the combination of ARIMA and ARCH models in R. This combination aims to make a better prediction with high volatile data, compared with only using a single ARIMA model.
The data used in this analysis is NIKE's daily stock price from 1/2/2020 to 12/30/2021, the data is obtained from Yahoo Finance.
The original data needs to achieve stationary before fitting into any time series model, the most common method is to take the first order differentiation. The ADF test is used to test if the difference data achieve stationarity.
The p-value shows the test failed to reject the null of non-stationary, thus the differentiation data is stationary.
ARIMA (Autoregressive Integrated Moving Average) model has three parts: Autoregressive(AR) analyzes the change of a variable based on its own lag value; Integrated(I) indicates the number of differentiation the time series data needs to achieve stationary. Moving Average(MA) analyzes the impact of past residuals.
The ACF and PACF plots are helpful when determining the best order for the ARIMA model.
Both ACF and PACF plot shows no significant lag after lag zero, the plots indicate random walk and a ARIMA(0,1,0) model. In addition to ACF and PACF plot, AICs provides another way to check and identify the model. Below are the AIC scores for five ARIMA models, based on AIC, we should choose the model with the smallest score, thus the ARIMA(2,1,2).
Based on the R result, the full ARIMA model is as follows:
It is important to ensure that the residuals of the model don’t suffer from autocorrelation, one method to test that is Ljung-box test:
The insignificant p-value shows that the data fails to reject the null hypothesis of no-autocorrelation, so the ARIMA(2, 1, 2) is an appropriate model for the data.
Although the residuals of ARIMA(2,1,2) don’t show autocorrelation, the residual plots below show clusters of volatility, and the series is not normally distributed, which indicates the time series model suffers from heteroskedasticity. Thus, the residual has to be applied to the ARCH model to capture this volatility.
The ARCH model is a model measuring variance, it is wildly used in financial analysis as the data is often highly volatile. Although the previous residual plots show the residuals suffer from heteroscedasticity, it is still helpful to run LM-test to see if the data series have ARCH effects.
The LM-test provides an insignificant value and fails to reject the null hypothesis of no ARCH effects.
For this analysis, ARCH(1) model is applied to the residuals.
The result shows the p-value is significant for omega and alpha1 terms, but insignificant for the mu term(constant). The ARCH(1) model is as follows:
Similar to the ARIMA model, the Ljung-box test is used to ensure the residuals of the ARCH model don’t show autocorrelation.
The insignificant p-value indicates the data fail to reject the null hypothesis of no-autocorrelation, and the ARCH(1) is a proper model for the residuals of the ARIMA model.
This section will compare the prediction output from the ARIMA(2,1,2) and ARIMA(2,1,2)-ARCH(1) models.
Following is the table summarizing the predictions of the two models and the actual value:
It is noted that the 95% confidence interval of ARIMA(2,1,2) is wider than the ARIMA(2,1,2)-ARCH(1). That is because the latter incorporated the recent volatility of the stock price by fitting the residual into the ARCH model.
ARIMA model is focusing on analyzing time series linearly, it doesn’t capture the recent changes especially when the data become volatile. Therefore the ARIMA-ARCH model is a good combination as the ARCH model can fit the noise term of the ARIMA model. The mixed model provides a smaller forecast and can make a better prediction compare with ARIMA-only model.