- Authors
- Name
- Introduction
- Definition of Volatility and Stylized Facts
- ARCH Model Basics
- GARCH(1,1) In-Depth
- GJR-GARCH and the Leverage Effect
- EGARCH Model
- Python Practical Implementation
- Model Diagnostics and Selection
- VaR and CVaR Calculation
- Backtesting and Validation
- Troubleshooting
- Operations Checklist
- Failure Cases and Lessons
- Advanced Topics: Next Steps in Volatility Forecasting
- References

Introduction
Predicting returns in financial markets is extremely difficult. However, predicting volatility is relatively feasible. This difference is the starting point of risk management. We may not know whether tomorrow's stock price will go up or down, but we can reasonably infer the magnitude of tomorrow's price movement from today's market conditions.
The GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model family is the core tool for volatility forecasting. Since Robert Engle proposed the ARCH model in 1982, it has evolved through Tim Bollerslev's GARCH (1986), Nelson's EGARCH (1991), and Glosten-Jagannathan-Runkle's GJR-GARCH (1993), becoming the standard framework for financial volatility modeling. Engle received the 2003 Nobel Prize in Economics for the ARCH model.
This article explains the mathematical principles of volatility models without stopping at listing formulas. It covers the entire practical workflow needed in practice: implementation using the Python arch package, model selection and diagnostics, VaR (Value at Risk) calculation, and backtesting. It is structured so that developers with limited mathematical background can follow along and apply to real data.
Definition of Volatility and Stylized Facts
What Is Volatility?
Volatility refers to the variance or standard deviation of asset returns. In practice, it is broadly divided into three categories.
- Historical Volatility: Standard deviation calculated from past return data. The simplest but backward-looking.
- Implied Volatility: Expected market volatility reverse-engineered from option prices. The VIX index is representative.
- Conditional Volatility: Expected value of future volatility conditioned on past information. This is exactly what GARCH models estimate.
Annualized volatility is calculated by multiplying daily volatility by the square root of the number of trading days. Applying approximately 250 annual trading days for the Korean market, a daily standard deviation of 1.2% corresponds to approximately 19% annualized volatility.
Stylized Facts of Financial Returns
Real financial data has characteristics that cannot be explained by simple statistical models assuming normal distribution and constant variance. These are called Stylized Facts, and GARCH models are designed to capture these characteristics.
| Stylized Fact | Description | GARCH Relevance |
|---|---|---|
| Volatility Clustering | Large changes tend to be followed by large changes, small by small | Core assumption and modeling target of GARCH |
| Fat Tails | Return distribution tails are thicker than normal (kurtosis over 3) | Captured by Student-t, GED, etc. non-normal distributions |
| Leverage Effect | Negative shocks increase volatility more than positive shocks | Modeled by EGARCH, GJR-GARCH for asymmetry |
| Mean Reversion | Volatility tends to revert to a long-run average level | Reflected by stationarity conditions of GARCH parameters |
| Long Memory | Autocorrelation of volatility decays very slowly | Captured by FIGARCH, IGARCH fractional integration models |
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from scipy import stats
# Collect KOSPI 200 ETF (KODEX 200) or S&P 500 data
spy = yf.download('SPY', start='2020-01-01', end='2026-03-01', auto_adjust=True)
returns = spy['Close'].pct_change().dropna() * 100 # Percentage returns
# Verify Stylized Facts
print(f"Mean: {returns.mean():.4f}%")
print(f"Std Dev: {returns.std():.4f}%")
print(f"Skewness: {returns.skew():.4f}")
print(f"Kurtosis: {returns.kurtosis():.4f}") # Excess kurtosis (normal = 0)
# Jarque-Bera normality test
jb_stat, jb_pval = stats.jarque_bera(returns)
print(f"Jarque-Bera statistic: {jb_stat:.2f}, p-value: {jb_pval:.6f}")
# Ljung-Box test: returns vs squared returns
from statsmodels.stats.diagnostic import acorr_ljungbox
lb_returns = acorr_ljungbox(returns, lags=10, return_df=True)
lb_squared = acorr_ljungbox(returns**2, lags=10, return_df=True)
print(f"\nReturns Ljung-Box p-value (lag=10): {lb_returns['lb_pvalue'].iloc[-1]:.4f}")
print(f"Returns^2 Ljung-Box p-value (lag=10): {lb_squared['lb_pvalue'].iloc[-1]:.6f}")
# Very small p-value for squared returns indicates volatility clustering
Running the code above typically shows kurtosis well above 3 (fat tails) and a significant Ljung-Box test for squared returns (volatility clustering). This provides the statistical justification for applying GARCH models.
ARCH Model Basics
Engle's ARCH(q) Model
The ARCH (AutoRegressive Conditional Heteroskedasticity) model assumes that conditional variance depends on past squared error terms.
The return process is defined as follows:
r_t = mu + epsilon_t
epsilon_t = sigma_t * z_t, z_t ~ N(0,1)
Where conditional variance sigma_t^2 is:
sigma_t^2 = omega + alpha_1 * epsilon_{t-1}^2 + ... + alpha_q * epsilon_{t-q}^2
omega > 0: Base level of long-run variancealpha_i >= 0: Impact coefficients of past shocksq: ARCH order (how many past time points of shocks to reflect)
For ARCH(1), if a large shock (positive or negative) occurred yesterday, today's conditional variance increases. This is the mechanism that explains volatility clustering.
Limitations of the ARCH Model
In practice, ARCH models are rarely used directly. To properly capture volatility clustering, q must be very large, meaning many parameters need to be estimated. It is common to need ARCH(20) or higher for a good fit, which introduces overfitting risk and estimation instability. GARCH overcomes this limitation.
GARCH(1,1) In-Depth
Model Structure
GARCH(p,q), proposed by Bollerslev (1986), adds past variance itself to the conditional variance equation:
sigma_t^2 = omega + alpha * epsilon_{t-1}^2 + beta * sigma_{t-1}^2
The three parameters of GARCH(1,1) each mean:
omega: Long-run variance contribution. Larger values mean a higher floor for variance.alpha: News shock coefficient. The magnitude of impact yesterday's market shock has on today's volatility.beta: Variance persistence coefficient. How much of yesterday's volatility carries over to today.
Stationarity Conditions and Parameter Interpretation
For GARCH(1,1) to be stationary, alpha + beta < 1 must hold. The closer this value is to 1, the longer volatility shocks persist.
The unconditional variance (long-run average variance) is calculated as:
sigma_long^2 = omega / (1 - alpha - beta)
When fitting GARCH(1,1) to equity returns in practice, you typically get:
| Parameter | Typical Range | Interpretation |
|---|---|---|
omega | 0.01 - 0.05 | Base level of long-run variance |
alpha | 0.05 - 0.15 | Speed of news shock reflection in volatility |
beta | 0.80 - 0.95 | Volatility persistence |
alpha + beta | 0.95 - 0.99 | Closer to 1 means shock effects last longer |
If alpha is large and beta is small, the market reacts quickly to new information and forgets quickly. If alpha is small and beta is large, once the volatility level changes, it persists for a long time.
Python Implementation: Fitting GARCH(1,1)
from arch import arch_model
import yfinance as yf
import pandas as pd
import numpy as np
# Data preparation
spy = yf.download('SPY', start='2018-01-01', end='2026-03-01', auto_adjust=True)
returns = spy['Close'].pct_change().dropna() * 100 # Percentage returns
# GARCH(1,1) with Student-t distribution
garch11 = arch_model(
returns,
vol='Garch',
p=1, # GARCH order (past variance)
q=1, # ARCH order (past shocks)
mean='Constant',
dist='t' # Student-t distribution (captures fat tails)
)
result = garch11.fit(disp='off')
print(result.summary())
# Extract parameters
omega = result.params['omega']
alpha = result.params['alpha[1]']
beta = result.params['beta[1]']
persistence = alpha + beta
print(f"\nomega: {omega:.6f}")
print(f"alpha: {alpha:.4f}")
print(f"beta: {beta:.4f}")
print(f"persistence (alpha+beta): {persistence:.4f}")
# Long-run unconditional variance and annualized volatility
long_run_var = omega / (1 - alpha - beta)
annual_vol = np.sqrt(long_run_var * 252) / 100 # Convert to decimal ratio
print(f"Long-run annualized volatility: {annual_vol:.2%}")
# Conditional volatility time series
cond_vol = result.conditional_volatility
print(f"\nConditional volatility - last 5 days:")
print(cond_vol.tail())
GJR-GARCH and the Leverage Effect
What Is the Leverage Effect?
In equity markets, even shocks of the same magnitude cause negative shocks to increase volatility more than positive shocks. This phenomenon, first observed by Black (1976), is called the leverage effect. The economic explanation is that when stock prices fall, the debt-to-equity ratio (leverage ratio) rises, increasing the firm's risk.
Standard GARCH(1,1) only reflects epsilon^2 regardless of the sign of the shock, so it cannot capture this asymmetry.
GJR-GARCH Model
GJR-GARCH, proposed by Glosten, Jagannathan, and Runkle (1993), introduces an indicator function to model asymmetry:
sigma_t^2 = omega + (alpha + gamma * I_{t-1}) * epsilon_{t-1}^2 + beta * sigma_{t-1}^2
Where I_{t-1} is 1 if epsilon_{t-1} < 0, otherwise 0.
- Positive shock (upturn): contributes
alphato variance - Negative shock (downturn): contributes
alpha + gammato variance gamma > 0indicates the leverage effect exists
# GJR-GARCH(1,1,1) with Student-t
gjr = arch_model(
returns,
vol='Garch',
p=1, o=1, q=1, # o=1 is GJR's asymmetric term
mean='Constant',
dist='t'
)
gjr_result = gjr.fit(disp='off')
print(gjr_result.summary())
gamma = gjr_result.params['gamma[1]']
alpha_gjr = gjr_result.params['alpha[1]']
print(f"\nalpha: {alpha_gjr:.4f}")
print(f"gamma: {gamma:.4f}")
print(f"Positive shock impact: {alpha_gjr:.4f}")
print(f"Negative shock impact: {alpha_gjr + gamma:.4f}")
print(f"Asymmetry ratio: {(alpha_gjr + gamma) / alpha_gjr:.2f}x")
If gamma is positive and statistically significant, the leverage effect is confirmed. In empirical studies, gamma for equity markets generally falls in the 0.05-0.15 range.
EGARCH Model
Nelson's (1991) EGARCH
EGARCH (Exponential GARCH) models log volatility, providing two advantages:
ln(sigma_t^2) = omega + alpha * (|z_{t-1}| - E|z_{t-1}|) + gamma * z_{t-1} + beta * ln(sigma_{t-1}^2)
Where z_{t-1} = epsilon_{t-1} / sigma_{t-1} is the standardized residual.
EGARCH advantages include:
- No parameter constraints needed: Taking the log ensures sigma_t^2 is automatically positive. No non-negativity constraints on alpha and beta are needed.
- Direct modeling of asymmetric effects: The gamma term directly reflects the sign of z. If gamma is negative, negative shocks increase volatility more.
- Multiplicative structure: Since it is a linear model in log space, shock effects are proportional to the existing volatility level.
# EGARCH(1,1) with Skewed Student-t
egarch = arch_model(
returns,
vol='EGARCH',
p=1, q=1,
mean='Constant',
dist='skewt' # Skewed Student-t distribution
)
egarch_result = egarch.fit(disp='off')
print(egarch_result.summary())
# Check asymmetry parameter in EGARCH
# gamma < 0 means negative shocks increase volatility more
gamma_egarch = egarch_result.params['gamma[1]']
print(f"\ngamma (asymmetry): {gamma_egarch:.4f}")
if gamma_egarch < 0:
print("Leverage effect confirmed: negative shocks increase volatility more")
GARCH Variant Model Comparison
| Model | Asymmetric Effect | Parameter Constraints | Long Memory | Practical Usage | Suitable Situation |
|---|---|---|---|---|---|
| GARCH(1,1) | None | omega, alpha, beta non-negative | None | Very high | Base benchmark, most assets |
| GJR-GARCH | Indicator function based | omega, alpha, beta, gamma non-negative | None | High | Equity markets (leverage effect) |
| EGARCH | Continuous asymmetry | No constraints | None | High | Option pricing, FX markets |
| TGARCH | Absolute value based | Non-negative constraints | None | Medium | GARCH(1,1) alternative |
| IGARCH | None | alpha+beta=1 | Unit root | Medium | High-frequency data |
| FIGARCH | None | Additional d parameter | Fractional integration | Low | Long-run volatility dependency research |
Python Practical Implementation
Environment Setup
# Core package installation
pip install arch yfinance pandas numpy scipy matplotlib statsmodels
# Optional packages
pip install seaborn quantstats
Volatility Modeling Python Library Comparison
| Library | GARCH Support | Asymmetric Models | Distribution Options | Forecasting | Strengths | Weaknesses |
|---|---|---|---|---|---|---|
| arch | GARCH, EGARCH, HARCH | GJR, EGARCH, APARCH | Normal, t, Skewt, GED | Multi-step supported | Most comprehensive GARCH implementation | API learning curve |
| statsmodels | Basic GARCH | Limited | Normal | Basic level | Integrated statistics package | GARCH features limited |
| rugarch (R) | Very extensive | Full support | 10+ distributions | Advanced forecasting | Academic standard | R environment required |
| rmgarch (R) | DCC-GARCH | Full multivariate | Various | Correlation forecasting | Multivariate GARCH | R environment required |
In the Python environment, the arch package is the de facto standard. Developed and maintained by Kevin Sheppard, it has been validated in both academic research and practice.
Full Workflow: From Data Collection to Forecasting
import numpy as np
import pandas as pd
import yfinance as yf
from arch import arch_model
from arch.unitroot import ADF, KPSS
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
# --------------------------------------------------
# Step 1: Data Collection and Preprocessing
# --------------------------------------------------
def prepare_returns(ticker: str, start: str, end: str) -> pd.Series:
"""Collect and preprocess return data."""
data = yf.download(ticker, start=start, end=end, auto_adjust=True)
prices = data['Close']
log_returns = np.log(prices / prices.shift(1)).dropna() * 100
return log_returns
returns = prepare_returns('SPY', '2018-01-01', '2026-03-01')
# --------------------------------------------------
# Step 2: Pre-tests
# --------------------------------------------------
def pre_tests(returns: pd.Series) -> dict:
"""Perform essential tests before GARCH model application."""
results = {}
# ADF test: check if returns are a stationary process
adf = ADF(returns, lags=5)
results['ADF_stat'] = adf.stat
results['ADF_pvalue'] = adf.pvalue
results['is_stationary'] = adf.pvalue < 0.05
# ARCH effect test (Engle's LM test)
from statsmodels.stats.diagnostic import het_arch
lm_stat, lm_pvalue, _, _ = het_arch(returns, nlags=10)
results['ARCH_LM_stat'] = lm_stat
results['ARCH_LM_pvalue'] = lm_pvalue
results['has_arch_effect'] = lm_pvalue < 0.05
return results
tests = pre_tests(returns)
for key, val in tests.items():
print(f"{key}: {val}")
# ADF p-value under 0.05 (stationary) + ARCH LM p-value under 0.05 (ARCH effect exists)
# Both conditions must be met for GARCH fitting to be meaningful
# --------------------------------------------------
# Step 3: Fit multiple models
# --------------------------------------------------
def fit_models(returns: pd.Series) -> dict:
"""Fit multiple GARCH variants and return results."""
models = {}
# 1) GARCH(1,1) - Student-t
am = arch_model(returns, vol='Garch', p=1, q=1,
mean='Constant', dist='t')
models['GARCH(1,1)-t'] = am.fit(disp='off')
# 2) GJR-GARCH(1,1,1) - Student-t
am = arch_model(returns, vol='Garch', p=1, o=1, q=1,
mean='Constant', dist='t')
models['GJR-GARCH-t'] = am.fit(disp='off')
# 3) EGARCH(1,1) - Skewed Student-t
am = arch_model(returns, vol='EGARCH', p=1, q=1,
mean='Constant', dist='skewt')
models['EGARCH-skewt'] = am.fit(disp='off')
# 4) GARCH(1,1) - Normal (for comparison)
am = arch_model(returns, vol='Garch', p=1, q=1,
mean='Constant', dist='normal')
models['GARCH(1,1)-N'] = am.fit(disp='off')
return models
models = fit_models(returns)
# --------------------------------------------------
# Step 4: Model comparison (information criteria)
# --------------------------------------------------
comparison = pd.DataFrame({
name: {
'AIC': res.aic,
'BIC': res.bic,
'Log-Likelihood': res.loglikelihood,
'Params': res.num_params
}
for name, res in models.items()
}).T
comparison = comparison.sort_values('BIC')
print("\nModel comparison (sorted by BIC):")
print(comparison.to_string())
Model Diagnostics and Selection
Residual Diagnostics
To verify whether a fitted GARCH model adequately explains the data, you need to test whether standardized residuals are iid and follow the chosen distribution.
def diagnose_model(result, model_name: str = ""):
"""Diagnose GARCH model fit quality."""
std_resid = result.std_resid
print(f"=== {model_name} Diagnostics ===")
# 1. Descriptive statistics of standardized residuals
print(f"Mean: {std_resid.mean():.4f} (expected: 0)")
print(f"Std Dev: {std_resid.std():.4f} (expected: 1)")
print(f"Skewness: {std_resid.skew():.4f}")
print(f"Kurtosis: {std_resid.kurtosis():.4f}")
# 2. Ljung-Box test: residual autocorrelation
from statsmodels.stats.diagnostic import acorr_ljungbox
lb_resid = acorr_ljungbox(std_resid, lags=10, return_df=True)
lb_sq = acorr_ljungbox(std_resid**2, lags=10, return_df=True)
print(f"\nLjung-Box (residuals, lag=10) p-value: "
f"{lb_resid['lb_pvalue'].iloc[-1]:.4f}")
print(f"Ljung-Box (residuals^2, lag=10) p-value: "
f"{lb_sq['lb_pvalue'].iloc[-1]:.4f}")
# p-value > 0.05 means no autocorrelation = model captures structure well
# 3. ARCH-LM test: residual ARCH effects
from statsmodels.stats.diagnostic import het_arch
lm_stat, lm_pval, _, _ = het_arch(std_resid, nlags=10)
print(f"ARCH-LM (residual effect) p-value: {lm_pval:.4f}")
# p-value > 0.05 means no residual ARCH effect = variance structure sufficiently captured
# 4. Normality vs t-distribution fit
jb_stat, jb_pval = stats.jarque_bera(std_resid)
print(f"Jarque-Bera p-value: {jb_pval:.6f}")
print()
return {
'lb_resid_pval': lb_resid['lb_pvalue'].iloc[-1],
'lb_sq_pval': lb_sq['lb_pvalue'].iloc[-1],
'arch_lm_pval': lm_pval,
'jb_pval': jb_pval
}
# Diagnose best model
best_model_name = comparison.index[0] # BIC minimum model
best_result = models[best_model_name]
diag = diagnose_model(best_result, best_model_name)
Model Selection Criteria Summary
| Criterion | When to Use | Preferred Direction | Notes |
|---|---|---|---|
| AIC | Prediction performance focus | Lower is better | Prone to overfitting, weak parameter penalty |
| BIC | Model parsimony focus | Lower is better | Conservative, prefers simpler models |
| Log-Likelihood | Absolute fit level | Higher is better | Cannot use alone (doesn't reflect parameter count) |
| Ljung-Box (residuals) | Mean equation fit | p > 0.05 | No autocorrelation should remain in residuals |
| Ljung-Box (residuals^2) | Variance equation fit | p > 0.05 | No residual ARCH effect should remain |
| ARCH-LM | Variance equation completeness | p > 0.05 | Cross-check with Ljung-Box recommended |
| Out-of-Sample RMSE | Practical predictive power | Lower is better | Most reliable comparison criterion |
Practical recommendation: Narrow candidates to 2-3 using BIC, then make the final decision based on out-of-sample performance (rolling window forecasting). If AIC and BIC point to different models, follow AIC if prediction is the goal, BIC if interpretation is the goal.
VaR and CVaR Calculation
GARCH-Based VaR
VaR (Value at Risk) is the maximum loss amount that can occur at a given confidence level over a certain period. Using GARCH models enables time-varying VaR calculation.
def calculate_garch_var(
returns: pd.Series,
model_result,
confidence_levels: list = [0.01, 0.05],
investment: float = 1_000_000
) -> pd.DataFrame:
"""
Calculate VaR and CVaR(ES) based on a GARCH model.
"""
cond_vol = model_result.conditional_volatility
mu = model_result.params.get('mu', 0)
dist = model_result.model.distribution
params = model_result.params
records = []
for alpha in confidence_levels:
if 'nu' in params.index:
nu = params['nu']
q = stats.t.ppf(alpha, df=nu)
else:
q = stats.norm.ppf(alpha)
var_pct = -(mu + cond_vol * q) / 100
var_amount = var_pct * investment
if 'nu' in params.index:
nu = params['nu']
pdf_q = stats.t.pdf(q, df=nu)
cvar_factor = -(
(nu + q**2) / (nu - 1)
) * pdf_q / alpha
else:
pdf_q = stats.norm.pdf(q)
cvar_factor = -pdf_q / alpha
cvar_pct = -(mu + cond_vol * cvar_factor) / 100
cvar_amount = cvar_pct * investment
for date in cond_vol.index:
records.append({
'date': date,
'alpha': alpha,
'confidence': f"{(1-alpha)*100:.0f}%",
'VaR_pct': var_pct.loc[date],
'VaR_amount': var_amount.loc[date],
'CVaR_pct': cvar_pct.loc[date],
'CVaR_amount': cvar_amount.loc[date],
})
return pd.DataFrame(records)
var_df = calculate_garch_var(returns, best_result)
recent = var_df[var_df['alpha'] == 0.01].tail(5)
print("Last 5 days 99% VaR (based on 1M won investment):")
print(recent[['date', 'confidence', 'VaR_pct', 'VaR_amount',
'CVaR_pct', 'CVaR_amount']].to_string(index=False))
VaR Interpretation Notes
A 99% 1-day VaR of 2.5% means "there is a 1% probability that tomorrow's return will fall below -2.5%." On a 1 million won investment, there is a 1% probability of a loss exceeding approximately 25,000 won. However, VaR of 2.5% does not mean the maximum loss is 2.5%. CVaR (Expected Shortfall), the average loss in cases exceeding VaR, more accurately reflects actual tail risk.
Backtesting and Validation
VaR Backtesting: Kupiec Test
The most basic method for validating VaR model accuracy is testing whether the actual VaR breach frequency matches the theoretical frequency. The Kupiec (1995) POF (Proportion of Failures) test is used.
def backtest_var(
returns: pd.Series,
model_type: str = 'Garch',
p: int = 1,
o: int = 0,
q: int = 1,
dist: str = 't',
window: int = 1000,
alpha: float = 0.01
) -> dict:
"""
Perform rolling window VaR backtesting.
"""
violations = []
var_series = []
actual_series = []
n_forecasts = len(returns) - window
print(f"Starting backtesting: {n_forecasts} day forecasts")
for i in range(window, len(returns)):
train = returns.iloc[i-window:i]
actual = returns.iloc[i]
try:
am = arch_model(train, vol=model_type,
p=p, o=o, q=q,
mean='Constant', dist=dist)
res = am.fit(disp='off', show_warning=False)
forecast = res.forecast(horizon=1)
mean_f = forecast.mean.iloc[-1, 0]
var_f = forecast.variance.iloc[-1, 0]
vol_f = np.sqrt(var_f)
if 'nu' in res.params.index:
nu = res.params['nu']
quantile = stats.t.ppf(alpha, df=nu)
else:
quantile = stats.norm.ppf(alpha)
var_value = -(mean_f + vol_f * quantile)
is_violation = actual < -var_value
violations.append(is_violation)
var_series.append(var_value)
actual_series.append(actual)
except Exception:
violations.append(False)
var_series.append(np.nan)
actual_series.append(actual)
violations = np.array(violations)
n_violations = violations.sum()
violation_rate = n_violations / len(violations)
expected_rate = alpha
n = len(violations)
x = n_violations
if x == 0 or x == n:
kupiec_stat = np.nan
kupiec_pval = np.nan
else:
lr = -2 * (
np.log((1 - alpha)**(n - x) * alpha**x)
- np.log((1 - x/n)**(n - x) * (x/n)**x)
)
kupiec_pval = 1 - stats.chi2.cdf(lr, df=1)
kupiec_stat = lr
result = {
'n_forecasts': n,
'n_violations': int(n_violations),
'violation_rate': violation_rate,
'expected_rate': expected_rate,
'kupiec_stat': kupiec_stat,
'kupiec_pval': kupiec_pval,
'model_accepted': kupiec_pval > 0.05 if not np.isnan(kupiec_pval) else None,
}
print(f"\nBacktesting results:")
print(f" Expected breach rate: {expected_rate:.2%}")
print(f" Actual breach rate: {violation_rate:.2%}")
print(f" Breach count: {n_violations}/{n}")
print(f" Kupiec LR statistic: {kupiec_stat:.4f}")
print(f" Kupiec p-value: {kupiec_pval:.4f}")
print(f" Model accepted: {'Yes' if result['model_accepted'] else 'No'}")
return result
bt_result = backtest_var(
returns,
model_type='Garch',
p=1, o=1, q=1, # GJR-GARCH
dist='t',
window=1000,
alpha=0.01
)
If the Kupiec test p-value is 0.05 or above, the model's VaR prediction is judged to be statistically accurate. If the p-value is below 0.05, the VaR breach frequency significantly differs from the theoretical level, meaning the model needs to be reviewed.
Backtesting Interpretation Guide
| Situation | Meaning | Response |
|---|---|---|
| Actual breach rate higher than expected | VaR underestimation (risk underestimation) | More conservative distribution (lower t df) or asymmetric model |
| Actual breach rate lower than expected | VaR overestimation (capital waste) | Try normal distribution or higher df t-distribution |
| Breaches occur in clusters | Conditional model fails to capture volatility regime changes | Markov regime-switching model or window reduction |
| Kupiec passes, Christoffersen fails | Overall frequency correct but temporal independence violated | Reconfigure mean or variance equation |
Troubleshooting
Common Issues and Solutions
Issue 1: Optimization Convergence Failure
ConvergenceWarning: Maximum number of iterations has been exceeded.
The cause is usually poor initial values or data scale issues.
Solution:
# 1. Check return scale (should be percentage, not decimal)
returns_pct = returns * 100 # decimal -> percentage
# 2. Try different optimization method
result = model.fit(
disp='off',
options={'maxiter': 500},
starting_values=np.array([0.01, 0.05, 0.90]) # omega, alpha, beta
)
Issue 2: Negative Variance Estimation
Rare in GARCH(1,1) due to parameter constraints, but can occur in non-standard models. Switching to EGARCH automatically resolves this since it models log variance.
Issue 3: alpha + beta equals or exceeds 1
Non-stationary GARCH means the long-run unconditional variance is undefined. The cause is usually structural breaks included in the data.
Solution:
# Reduce estimation period (use only data after structural break)
# e.g.: Use only post-COVID data
returns_post_covid = returns['2020-07-01':]
am = arch_model(returns_post_covid, vol='Garch', p=1, q=1,
mean='Constant', dist='t')
result = am.fit(disp='off')
persistence = result.params['alpha[1]'] + result.params['beta[1]']
print(f"Persistence: {persistence:.4f}")
Issue 4: Forecast Volatility Converges to a Constant
GARCH multi-step forecasts converging quickly to the long-run unconditional variance is normal behavior. Convergence speed is determined by alpha + beta. For long-horizon forecasts, consider EWMA or implied volatility-based approaches over GARCH.
Issue 5: Unstable Student-t Degrees of Freedom Estimation
# Fix degrees of freedom
am = arch_model(returns, vol='Garch', p=1, q=1,
mean='Constant', dist='t')
# Fix df (e.g., nu=5)
constraints = {'nu': 5.0}
result = am.fit(disp='off')
# Or use GED distribution as alternative
am_ged = arch_model(returns, vol='Garch', p=1, q=1,
mean='Constant', dist='ged')
result_ged = am_ged.fit(disp='off')
Operations Checklist
Model Construction Phase
- Confirm return data is a stationary process (ADF test p-value under 0.05)
- Confirm ARCH effects exist (Engle LM test p-value under 0.05)
- Check kurtosis and skewness of return distribution (normality assessment)
- Distribution selection: Student-t if high kurtosis, Skewed-t if skewed
- Fit 2-3 model candidates (GARCH, GJR-GARCH, EGARCH)
- Sort candidates by information criteria (AIC/BIC)
- Perform residual diagnostics on optimal model (Ljung-Box, ARCH-LM)
Forecasting and Application Phase
- Out-of-sample predictive power validation (rolling window)
- VaR backtesting performed (Kupiec test pass/fail)
- Verify annualization unit conversion of forecast volatility
- Reflect position size and holding period in VaR amount calculation
- Report CVaR (Expected Shortfall) alongside
Operations and Monitoring Phase
- Determine model re-estimation frequency (weekly/monthly)
- Verify missing value and outlier handling logic in data pipeline
- Simulate model behavior under extreme market conditions
- Implement structural break detection logic
- Build alert system for VaR breach events
- Generate quarterly model performance review reports
Failure Cases and Lessons
Case 1: VaR Underestimation Due to Normal Distribution Assumption
A risk management team calculated 99% VaR using GARCH(1,1)-Normal. Backtesting showed the VaR breach rate at 3.2% instead of the expected 1%. The cause was that the normal distribution's tails are thinner than the actual return distribution.
Lesson: Do not use normal distribution as the default when fitting GARCH to financial returns. Use Student-t distribution as the default, and compare with normal distribution using BIC. Empirically, Student-t is almost always superior.
# Bad example: normal distribution assumption
bad_model = arch_model(returns, vol='Garch', p=1, q=1,
mean='Constant', dist='normal')
# Good example: Student-t distribution default
good_model = arch_model(returns, vol='Garch', p=1, q=1,
mean='Constant', dist='t')
Case 2: Long-Term Data Ignoring Structural Breaks
Fitting GARCH to 16 years of data from the 2008 financial crisis through 2024 resulted in alpha + beta of 0.999. This is close to IGARCH (variance with unit root), making the volatility forecast effectively "current level persists forever," rendering it meaningless.
Lesson: Excessively long data containing structural breaks (financial crises, pandemics, etc.) distorts parameter estimation. A 3-5 year window is the practical default, using only data after structural breaks. Bai-Perron or CUSUM tests can detect structural break points.
Case 3: Confusing Volatility Forecasting with Return Forecasting
A quant trader ran a strategy of selling when GARCH conditional variance was high and buying when low. The result was persistent underperformance vs the market.
Lesson: GARCH is a model for forecasting volatility (risk), not returns (direction). High volatility also means the possibility of large upside. The correct application is to use GARCH output for position sizing (reduce position when volatility is high) or risk management.
Case 4: Applying Daily GARCH to Intraday Data
Standard GARCH(1,1) was applied to 5-minute bar data, but intraday patterns (volatility spike after market open, dip at lunch, rise before close) remained entirely in the residuals. All Ljung-Box tests were rejected, making the model meaningless.
Lesson: Intraday periodicity must be removed first from intraday data. Use Andersen and Bollerslev's (1997) FFF (Flexible Fourier Form) filter to remove intraday patterns before applying GARCH, or use the HAR-RV model that directly models Realized Volatility.
Case 5: Applying Univariate GARCH Separately to a Multivariate Portfolio
When measuring risk for a 5-asset portfolio, individual GARCH was fitted to each asset and portfolio variance was calculated using fixed correlations. During the COVID crash of March 2020, actual losses exceeded VaR by 4x.
Lesson: During crises, correlations between assets spike sharply. The sum of univariate GARCH models cannot capture this dynamic correlation. DCC-GARCH (Dynamic Conditional Correlation) or CCC-GARCH should be used to model time-varying correlations together. In Python, the arch package still has limited multivariate model support, so R's rmgarch package must be used in parallel or implemented directly.
Advanced Topics: Next Steps in Volatility Forecasting
Realized Volatility and HAR-RV Model
While GARCH assumes a parametric structure for returns, Realized Volatility measures volatility non-parametrically from intraday high-frequency data. Corsi's (2009) HAR-RV model combines autoregressive structures of daily, weekly, and monthly realized volatility, demonstrating excellent predictive power.
Integration with Machine Learning
Recent research has focused on hybrid approaches that use GARCH conditional variance as features for deep learning models like LSTM and Transformer. Combining the information GARCH captures from linear structures with the non-linear patterns neural networks capture can improve predictive performance over standalone use. However, due to the low signal-to-noise ratio (SNR) in financial data, strict cross-validation is essential given the high risk of overfitting.
Regime-Switching GARCH (Markov-Switching GARCH)
This model assumes volatility switches between two (or more) regimes: high-volatility and low-volatility. It is advantageous for capturing extreme market regime transitions like the 2008 financial crisis and 2020 COVID. However, it has drawbacks of complex parameter estimation and arbitrariness in choosing the number of regimes.
References
arch package official documentation - The standard library for Python GARCH modeling. API reference and example code are thorough. https://arch.readthedocs.io/en/latest/
Machine Learning Mastery - ARCH and GARCH Models - An introductory guide explaining ARCH/GARCH concepts and Python implementation step by step. https://machinelearningmastery.com/develop-arch-and-garch-models-for-time-series-forecasting-in-python/
QuantInsti - GARCH and GJR-GARCH Volatility Forecasting - A practical guide on volatility forecasting and trading application using GJR-GARCH. https://blog.quantinsti.com/garch-gjr-garch-volatility-forecasting-python/
Forecastegy - Volatility Forecasting in Python - Comparison and Python implementation of various volatility forecasting methodologies. https://forecastegy.com/posts/volatility-forecasting-in-python/
DataCamp - GARCH Models in Python - A systematic learning course on GARCH models with well-structured curriculum from concepts to practical application. https://www.datacamp.com/courses/garch-models-in-python
Bollerslev, T. (1986) - "Generalized Autoregressive Conditional Heteroskedasticity." Journal of Econometrics, 31(3), 307-327. The original GARCH paper.
Engle, R.F. (1982) - "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation." Econometrica, 50(4), 987-1007. The original ARCH paper and Nobel Prize-winning research.
Hansen, P.R. and Lunde, A. (2005) - "A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?" Journal of Applied Econometrics. A study comparing hundreds of volatility models concluding that GARCH(1,1) is surprisingly robust.