Time Series Volatility Forecasting Practical Guide: GARCH Model Family and Python Implementation

Introduction
Definition of Volatility and Stylized Facts
- What Is Volatility?
- Stylized Facts of Financial Returns
ARCH Model Basics
- Engle's ARCH(q) Model
- Limitations of the ARCH Model
GARCH(1,1) In-Depth
GJR-GARCH and the Leverage Effect
- What Is the Leverage Effect?
- GJR-GARCH Model
EGARCH Model
- Nelson's (1991) EGARCH
- GARCH Variant Model Comparison
Python Practical Implementation
Model Diagnostics and Selection
- Residual Diagnostics
- Model Selection Criteria Summary
VaR and CVaR Calculation
- GARCH-Based VaR
- VaR Interpretation Notes
Backtesting and Validation
- VaR Backtesting: Kupiec Test
- Backtesting Interpretation Guide
Troubleshooting
- Common Issues and Solutions
Operations Checklist
Failure Cases and Lessons
Advanced Topics: Next Steps in Volatility Forecasting
References

Introduction

Predicting returns in financial markets is extremely difficult. However, predicting volatility is relatively feasible. This difference is the starting point of risk management. We may not know whether tomorrow's stock price will go up or down, but we can reasonably infer the magnitude of tomorrow's price movement from today's market conditions.

The GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model family is the core tool for volatility forecasting. Since Robert Engle proposed the ARCH model in 1982, it has evolved through Tim Bollerslev's GARCH (1986), Nelson's EGARCH (1991), and Glosten-Jagannathan-Runkle's GJR-GARCH (1993), becoming the standard framework for financial volatility modeling. Engle received the 2003 Nobel Prize in Economics for the ARCH model.

This article explains the mathematical principles of volatility models without stopping at listing formulas. It covers the entire practical workflow needed in practice: implementation using the Python arch package, model selection and diagnostics, VaR (Value at Risk) calculation, and backtesting. It is structured so that developers with limited mathematical background can follow along and apply to real data.

Definition of Volatility and Stylized Facts

What Is Volatility?

Volatility refers to the variance or standard deviation of asset returns. In practice, it is broadly divided into three categories.

Historical Volatility: Standard deviation calculated from past return data. The simplest but backward-looking.
Implied Volatility: Expected market volatility reverse-engineered from option prices. The VIX index is representative.
Conditional Volatility: Expected value of future volatility conditioned on past information. This is exactly what GARCH models estimate.

Annualized volatility is calculated by multiplying daily volatility by the square root of the number of trading days. Applying approximately 250 annual trading days for the Korean market, a daily standard deviation of 1.2% corresponds to approximately 19% annualized volatility.

Stylized Facts of Financial Returns

Real financial data has characteristics that cannot be explained by simple statistical models assuming normal distribution and constant variance. These are called Stylized Facts, and GARCH models are designed to capture these characteristics.

Stylized Fact	Description	GARCH Relevance
Volatility Clustering	Large changes tend to be followed by large changes, small by small	Core assumption and modeling target of GARCH
Fat Tails	Return distribution tails are thicker than normal (kurtosis over 3)	Captured by Student-t, GED, etc. non-normal distributions
Leverage Effect	Negative shocks increase volatility more than positive shocks	Modeled by EGARCH, GJR-GARCH for asymmetry
Mean Reversion	Volatility tends to revert to a long-run average level	Reflected by stationarity conditions of GARCH parameters
Long Memory	Autocorrelation of volatility decays very slowly	Captured by FIGARCH, IGARCH fractional integration models

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from scipy import stats

# Collect KOSPI 200 ETF (KODEX 200) or S&P 500 data
spy = yf.download('SPY', start='2020-01-01', end='2026-03-01', auto_adjust=True)
returns = spy['Close'].pct_change().dropna() * 100  # Percentage returns

# Verify Stylized Facts
print(f"Mean: {returns.mean():.4f}%")
print(f"Std Dev: {returns.std():.4f}%")
print(f"Skewness: {returns.skew():.4f}")
print(f"Kurtosis: {returns.kurtosis():.4f}")  # Excess kurtosis (normal = 0)

# Jarque-Bera normality test
jb_stat, jb_pval = stats.jarque_bera(returns)
print(f"Jarque-Bera statistic: {jb_stat:.2f}, p-value: {jb_pval:.6f}")

# Ljung-Box test: returns vs squared returns
from statsmodels.stats.diagnostic import acorr_ljungbox

lb_returns = acorr_ljungbox(returns, lags=10, return_df=True)
lb_squared = acorr_ljungbox(returns**2, lags=10, return_df=True)
print(f"\nReturns Ljung-Box p-value (lag=10): {lb_returns['lb_pvalue'].iloc[-1]:.4f}")
print(f"Returns^2 Ljung-Box p-value (lag=10): {lb_squared['lb_pvalue'].iloc[-1]:.6f}")
# Very small p-value for squared returns indicates volatility clustering

Running the code above typically shows kurtosis well above 3 (fat tails) and a significant Ljung-Box test for squared returns (volatility clustering). This provides the statistical justification for applying GARCH models.

ARCH Model Basics

Engle's ARCH(q) Model

The ARCH (AutoRegressive Conditional Heteroskedasticity) model assumes that conditional variance depends on past squared error terms.

The return process is defined as follows:

r_t = mu + epsilon_t
epsilon_t = sigma_t * z_t,  z_t ~ N(0,1)

Where conditional variance sigma_t^2 is:

sigma_t^2 = omega + alpha_1 * epsilon_{t-1}^2 + ... + alpha_q * epsilon_{t-q}^2

omega > 0: Base level of long-run variance
alpha_i >= 0: Impact coefficients of past shocks
q: ARCH order (how many past time points of shocks to reflect)

For ARCH(1), if a large shock (positive or negative) occurred yesterday, today's conditional variance increases. This is the mechanism that explains volatility clustering.

Limitations of the ARCH Model

In practice, ARCH models are rarely used directly. To properly capture volatility clustering, q must be very large, meaning many parameters need to be estimated. It is common to need ARCH(20) or higher for a good fit, which introduces overfitting risk and estimation instability. GARCH overcomes this limitation.

GARCH(1,1) In-Depth

Model Structure

GARCH(p,q), proposed by Bollerslev (1986), adds past variance itself to the conditional variance equation:

sigma_t^2 = omega + alpha * epsilon_{t-1}^2 + beta * sigma_{t-1}^2

The three parameters of GARCH(1,1) each mean:

omega: Long-run variance contribution. Larger values mean a higher floor for variance.
alpha: News shock coefficient. The magnitude of impact yesterday's market shock has on today's volatility.
beta: Variance persistence coefficient. How much of yesterday's volatility carries over to today.

Stationarity Conditions and Parameter Interpretation

For GARCH(1,1) to be stationary, alpha + beta < 1 must hold. The closer this value is to 1, the longer volatility shocks persist.

The unconditional variance (long-run average variance) is calculated as:

sigma_long^2 = omega / (1 - alpha - beta)

When fitting GARCH(1,1) to equity returns in practice, you typically get:

Parameter	Typical Range	Interpretation
`omega`	0.01 - 0.05	Base level of long-run variance
`alpha`	0.05 - 0.15	Speed of news shock reflection in volatility
`beta`	0.80 - 0.95	Volatility persistence
`alpha + beta`	0.95 - 0.99	Closer to 1 means shock effects last longer

If alpha is large and beta is small, the market reacts quickly to new information and forgets quickly. If alpha is small and beta is large, once the volatility level changes, it persists for a long time.

Python Implementation: Fitting GARCH(1,1)

from arch import arch_model
import yfinance as yf
import pandas as pd
import numpy as np

# Data preparation
spy = yf.download('SPY', start='2018-01-01', end='2026-03-01', auto_adjust=True)
returns = spy['Close'].pct_change().dropna() * 100  # Percentage returns

# GARCH(1,1) with Student-t distribution
garch11 = arch_model(
    returns,
    vol='Garch',
    p=1,          # GARCH order (past variance)
    q=1,          # ARCH order (past shocks)
    mean='Constant',
    dist='t'      # Student-t distribution (captures fat tails)
)
result = garch11.fit(disp='off')
print(result.summary())

# Extract parameters
omega = result.params['omega']
alpha = result.params['alpha[1]']
beta = result.params['beta[1]']
persistence = alpha + beta

print(f"\nomega: {omega:.6f}")
print(f"alpha: {alpha:.4f}")
print(f"beta: {beta:.4f}")
print(f"persistence (alpha+beta): {persistence:.4f}")

# Long-run unconditional variance and annualized volatility
long_run_var = omega / (1 - alpha - beta)
annual_vol = np.sqrt(long_run_var * 252) / 100  # Convert to decimal ratio
print(f"Long-run annualized volatility: {annual_vol:.2%}")

# Conditional volatility time series
cond_vol = result.conditional_volatility
print(f"\nConditional volatility - last 5 days:")
print(cond_vol.tail())

GJR-GARCH and the Leverage Effect

What Is the Leverage Effect?

In equity markets, even shocks of the same magnitude cause negative shocks to increase volatility more than positive shocks. This phenomenon, first observed by Black (1976), is called the leverage effect. The economic explanation is that when stock prices fall, the debt-to-equity ratio (leverage ratio) rises, increasing the firm's risk.

Standard GARCH(1,1) only reflects epsilon^2 regardless of the sign of the shock, so it cannot capture this asymmetry.

GJR-GARCH Model

GJR-GARCH, proposed by Glosten, Jagannathan, and Runkle (1993), introduces an indicator function to model asymmetry:

sigma_t^2 = omega + (alpha + gamma * I_{t-1}) * epsilon_{t-1}^2 + beta * sigma_{t-1}^2

Where I_{t-1} is 1 if epsilon_{t-1} < 0, otherwise 0.

Positive shock (upturn): contributes alpha to variance
Negative shock (downturn): contributes alpha + gamma to variance
gamma > 0 indicates the leverage effect exists

# GJR-GARCH(1,1,1) with Student-t
gjr = arch_model(
    returns,
    vol='Garch',
    p=1, o=1, q=1,   # o=1 is GJR's asymmetric term
    mean='Constant',
    dist='t'
)
gjr_result = gjr.fit(disp='off')
print(gjr_result.summary())

gamma = gjr_result.params['gamma[1]']
alpha_gjr = gjr_result.params['alpha[1]']

print(f"\nalpha: {alpha_gjr:.4f}")
print(f"gamma: {gamma:.4f}")
print(f"Positive shock impact: {alpha_gjr:.4f}")
print(f"Negative shock impact: {alpha_gjr + gamma:.4f}")
print(f"Asymmetry ratio: {(alpha_gjr + gamma) / alpha_gjr:.2f}x")

If gamma is positive and statistically significant, the leverage effect is confirmed. In empirical studies, gamma for equity markets generally falls in the 0.05-0.15 range.

EGARCH Model

Nelson's (1991) EGARCH

EGARCH (Exponential GARCH) models log volatility, providing two advantages:

ln(sigma_t^2) = omega + alpha * (|z_{t-1}| - E|z_{t-1}|) + gamma * z_{t-1} + beta * ln(sigma_{t-1}^2)

Where z_{t-1} = epsilon_{t-1} / sigma_{t-1} is the standardized residual.

EGARCH advantages include:

No parameter constraints needed: Taking the log ensures sigma_t^2 is automatically positive. No non-negativity constraints on alpha and beta are needed.
Direct modeling of asymmetric effects: The gamma term directly reflects the sign of z. If gamma is negative, negative shocks increase volatility more.
Multiplicative structure: Since it is a linear model in log space, shock effects are proportional to the existing volatility level.

# EGARCH(1,1) with Skewed Student-t
egarch = arch_model(
    returns,
    vol='EGARCH',
    p=1, q=1,
    mean='Constant',
    dist='skewt'   # Skewed Student-t distribution
)
egarch_result = egarch.fit(disp='off')
print(egarch_result.summary())

# Check asymmetry parameter in EGARCH
# gamma < 0 means negative shocks increase volatility more
gamma_egarch = egarch_result.params['gamma[1]']
print(f"\ngamma (asymmetry): {gamma_egarch:.4f}")
if gamma_egarch < 0:
    print("Leverage effect confirmed: negative shocks increase volatility more")

GARCH Variant Model Comparison

Model	Asymmetric Effect	Parameter Constraints	Long Memory	Practical Usage	Suitable Situation
GARCH(1,1)	None	omega, alpha, beta non-negative	None	Very high	Base benchmark, most assets
GJR-GARCH	Indicator function based	omega, alpha, beta, gamma non-negative	None	High	Equity markets (leverage effect)
EGARCH	Continuous asymmetry	No constraints	None	High	Option pricing, FX markets
TGARCH	Absolute value based	Non-negative constraints	None	Medium	GARCH(1,1) alternative
IGARCH	None	alpha+beta=1	Unit root	Medium	High-frequency data
FIGARCH	None	Additional d parameter	Fractional integration	Low	Long-run volatility dependency research

Python Practical Implementation

Environment Setup

# Core package installation
pip install arch yfinance pandas numpy scipy matplotlib statsmodels

# Optional packages
pip install seaborn quantstats

Volatility Modeling Python Library Comparison

Library	GARCH Support	Asymmetric Models	Distribution Options	Forecasting	Strengths	Weaknesses
arch	GARCH, EGARCH, HARCH	GJR, EGARCH, APARCH	Normal, t, Skewt, GED	Multi-step supported	Most comprehensive GARCH implementation	API learning curve
statsmodels	Basic GARCH	Limited	Normal	Basic level	Integrated statistics package	GARCH features limited
rugarch (R)	Very extensive	Full support	10+ distributions	Advanced forecasting	Academic standard	R environment required
rmgarch (R)	DCC-GARCH	Full multivariate	Various	Correlation forecasting	Multivariate GARCH	R environment required

In the Python environment, the arch package is the de facto standard. Developed and maintained by Kevin Sheppard, it has been validated in both academic research and practice.

Full Workflow: From Data Collection to Forecasting

import numpy as np
import pandas as pd
import yfinance as yf
from arch import arch_model
from arch.unitroot import ADF, KPSS
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# --------------------------------------------------
# Step 1: Data Collection and Preprocessing
# --------------------------------------------------
def prepare_returns(ticker: str, start: str, end: str) -> pd.Series:
    """Collect and preprocess return data."""
    data = yf.download(ticker, start=start, end=end, auto_adjust=True)
    prices = data['Close']
    log_returns = np.log(prices / prices.shift(1)).dropna() * 100
    return log_returns

returns = prepare_returns('SPY', '2018-01-01', '2026-03-01')

# --------------------------------------------------
# Step 2: Pre-tests
# --------------------------------------------------
def pre_tests(returns: pd.Series) -> dict:
    """Perform essential tests before GARCH model application."""
    results = {}

    # ADF test: check if returns are a stationary process
    adf = ADF(returns, lags=5)
    results['ADF_stat'] = adf.stat
    results['ADF_pvalue'] = adf.pvalue
    results['is_stationary'] = adf.pvalue < 0.05

    # ARCH effect test (Engle's LM test)
    from statsmodels.stats.diagnostic import het_arch
    lm_stat, lm_pvalue, _, _ = het_arch(returns, nlags=10)
    results['ARCH_LM_stat'] = lm_stat
    results['ARCH_LM_pvalue'] = lm_pvalue
    results['has_arch_effect'] = lm_pvalue < 0.05

    return results

tests = pre_tests(returns)
for key, val in tests.items():
    print(f"{key}: {val}")

# ADF p-value under 0.05 (stationary) + ARCH LM p-value under 0.05 (ARCH effect exists)
# Both conditions must be met for GARCH fitting to be meaningful

# --------------------------------------------------
# Step 3: Fit multiple models
# --------------------------------------------------
def fit_models(returns: pd.Series) -> dict:
    """Fit multiple GARCH variants and return results."""
    models = {}

    # 1) GARCH(1,1) - Student-t
    am = arch_model(returns, vol='Garch', p=1, q=1,
                    mean='Constant', dist='t')
    models['GARCH(1,1)-t'] = am.fit(disp='off')

    # 2) GJR-GARCH(1,1,1) - Student-t
    am = arch_model(returns, vol='Garch', p=1, o=1, q=1,
                    mean='Constant', dist='t')
    models['GJR-GARCH-t'] = am.fit(disp='off')

    # 3) EGARCH(1,1) - Skewed Student-t
    am = arch_model(returns, vol='EGARCH', p=1, q=1,
                    mean='Constant', dist='skewt')
    models['EGARCH-skewt'] = am.fit(disp='off')

    # 4) GARCH(1,1) - Normal (for comparison)
    am = arch_model(returns, vol='Garch', p=1, q=1,
                    mean='Constant', dist='normal')
    models['GARCH(1,1)-N'] = am.fit(disp='off')

    return models

models = fit_models(returns)

# --------------------------------------------------
# Step 4: Model comparison (information criteria)
# --------------------------------------------------
comparison = pd.DataFrame({
    name: {
        'AIC': res.aic,
        'BIC': res.bic,
        'Log-Likelihood': res.loglikelihood,
        'Params': res.num_params
    }
    for name, res in models.items()
}).T
comparison = comparison.sort_values('BIC')
print("\nModel comparison (sorted by BIC):")
print(comparison.to_string())

Model Diagnostics and Selection

Residual Diagnostics

To verify whether a fitted GARCH model adequately explains the data, you need to test whether standardized residuals are iid and follow the chosen distribution.

def diagnose_model(result, model_name: str = ""):
    """Diagnose GARCH model fit quality."""
    std_resid = result.std_resid
    print(f"=== {model_name} Diagnostics ===")

    # 1. Descriptive statistics of standardized residuals
    print(f"Mean: {std_resid.mean():.4f} (expected: 0)")
    print(f"Std Dev: {std_resid.std():.4f} (expected: 1)")
    print(f"Skewness: {std_resid.skew():.4f}")
    print(f"Kurtosis: {std_resid.kurtosis():.4f}")

    # 2. Ljung-Box test: residual autocorrelation
    from statsmodels.stats.diagnostic import acorr_ljungbox
    lb_resid = acorr_ljungbox(std_resid, lags=10, return_df=True)
    lb_sq = acorr_ljungbox(std_resid**2, lags=10, return_df=True)

    print(f"\nLjung-Box (residuals, lag=10) p-value: "
          f"{lb_resid['lb_pvalue'].iloc[-1]:.4f}")
    print(f"Ljung-Box (residuals^2, lag=10) p-value: "
          f"{lb_sq['lb_pvalue'].iloc[-1]:.4f}")

    # p-value > 0.05 means no autocorrelation = model captures structure well

    # 3. ARCH-LM test: residual ARCH effects
    from statsmodels.stats.diagnostic import het_arch
    lm_stat, lm_pval, _, _ = het_arch(std_resid, nlags=10)
    print(f"ARCH-LM (residual effect) p-value: {lm_pval:.4f}")

    # p-value > 0.05 means no residual ARCH effect = variance structure sufficiently captured

    # 4. Normality vs t-distribution fit
    jb_stat, jb_pval = stats.jarque_bera(std_resid)
    print(f"Jarque-Bera p-value: {jb_pval:.6f}")

    print()
    return {
        'lb_resid_pval': lb_resid['lb_pvalue'].iloc[-1],
        'lb_sq_pval': lb_sq['lb_pvalue'].iloc[-1],
        'arch_lm_pval': lm_pval,
        'jb_pval': jb_pval
    }

# Diagnose best model
best_model_name = comparison.index[0]  # BIC minimum model
best_result = models[best_model_name]
diag = diagnose_model(best_result, best_model_name)

Model Selection Criteria Summary

Criterion	When to Use	Preferred Direction	Notes
AIC	Prediction performance focus	Lower is better	Prone to overfitting, weak parameter penalty
BIC	Model parsimony focus	Lower is better	Conservative, prefers simpler models
Log-Likelihood	Absolute fit level	Higher is better	Cannot use alone (doesn't reflect parameter count)
Ljung-Box (residuals)	Mean equation fit	p > 0.05	No autocorrelation should remain in residuals
Ljung-Box (residuals^2)	Variance equation fit	p > 0.05	No residual ARCH effect should remain
ARCH-LM	Variance equation completeness	p > 0.05	Cross-check with Ljung-Box recommended
Out-of-Sample RMSE	Practical predictive power	Lower is better	Most reliable comparison criterion

Practical recommendation: Narrow candidates to 2-3 using BIC, then make the final decision based on out-of-sample performance (rolling window forecasting). If AIC and BIC point to different models, follow AIC if prediction is the goal, BIC if interpretation is the goal.

VaR and CVaR Calculation

GARCH-Based VaR

VaR (Value at Risk) is the maximum loss amount that can occur at a given confidence level over a certain period. Using GARCH models enables time-varying VaR calculation.

def calculate_garch_var(
    returns: pd.Series,
    model_result,
    confidence_levels: list = [0.01, 0.05],
    investment: float = 1_000_000
) -> pd.DataFrame:
    """
    Calculate VaR and CVaR(ES) based on a GARCH model.
    """
    cond_vol = model_result.conditional_volatility
    mu = model_result.params.get('mu', 0)

    dist = model_result.model.distribution
    params = model_result.params

    records = []
    for alpha in confidence_levels:
        if 'nu' in params.index:
            nu = params['nu']
            q = stats.t.ppf(alpha, df=nu)
        else:
            q = stats.norm.ppf(alpha)

        var_pct = -(mu + cond_vol * q) / 100
        var_amount = var_pct * investment

        if 'nu' in params.index:
            nu = params['nu']
            pdf_q = stats.t.pdf(q, df=nu)
            cvar_factor = -(
                (nu + q**2) / (nu - 1)
            ) * pdf_q / alpha
        else:
            pdf_q = stats.norm.pdf(q)
            cvar_factor = -pdf_q / alpha

        cvar_pct = -(mu + cond_vol * cvar_factor) / 100
        cvar_amount = cvar_pct * investment

        for date in cond_vol.index:
            records.append({
                'date': date,
                'alpha': alpha,
                'confidence': f"{(1-alpha)*100:.0f}%",
                'VaR_pct': var_pct.loc[date],
                'VaR_amount': var_amount.loc[date],
                'CVaR_pct': cvar_pct.loc[date],
                'CVaR_amount': cvar_amount.loc[date],
            })

    return pd.DataFrame(records)

var_df = calculate_garch_var(returns, best_result)

recent = var_df[var_df['alpha'] == 0.01].tail(5)
print("Last 5 days 99% VaR (based on 1M won investment):")
print(recent[['date', 'confidence', 'VaR_pct', 'VaR_amount',
              'CVaR_pct', 'CVaR_amount']].to_string(index=False))

VaR Interpretation Notes

A 99% 1-day VaR of 2.5% means "there is a 1% probability that tomorrow's return will fall below -2.5%." On a 1 million won investment, there is a 1% probability of a loss exceeding approximately 25,000 won. However, VaR of 2.5% does not mean the maximum loss is 2.5%. CVaR (Expected Shortfall), the average loss in cases exceeding VaR, more accurately reflects actual tail risk.

Backtesting and Validation

VaR Backtesting: Kupiec Test

The most basic method for validating VaR model accuracy is testing whether the actual VaR breach frequency matches the theoretical frequency. The Kupiec (1995) POF (Proportion of Failures) test is used.

def backtest_var(
    returns: pd.Series,
    model_type: str = 'Garch',
    p: int = 1,
    o: int = 0,
    q: int = 1,
    dist: str = 't',
    window: int = 1000,
    alpha: float = 0.01
) -> dict:
    """
    Perform rolling window VaR backtesting.
    """
    violations = []
    var_series = []
    actual_series = []

    n_forecasts = len(returns) - window
    print(f"Starting backtesting: {n_forecasts} day forecasts")

    for i in range(window, len(returns)):
        train = returns.iloc[i-window:i]
        actual = returns.iloc[i]

        try:
            am = arch_model(train, vol=model_type,
                           p=p, o=o, q=q,
                           mean='Constant', dist=dist)
            res = am.fit(disp='off', show_warning=False)

            forecast = res.forecast(horizon=1)
            mean_f = forecast.mean.iloc[-1, 0]
            var_f = forecast.variance.iloc[-1, 0]
            vol_f = np.sqrt(var_f)

            if 'nu' in res.params.index:
                nu = res.params['nu']
                quantile = stats.t.ppf(alpha, df=nu)
            else:
                quantile = stats.norm.ppf(alpha)

            var_value = -(mean_f + vol_f * quantile)

            is_violation = actual < -var_value

            violations.append(is_violation)
            var_series.append(var_value)
            actual_series.append(actual)
        except Exception:
            violations.append(False)
            var_series.append(np.nan)
            actual_series.append(actual)

    violations = np.array(violations)
    n_violations = violations.sum()
    violation_rate = n_violations / len(violations)
    expected_rate = alpha

    n = len(violations)
    x = n_violations

    if x == 0 or x == n:
        kupiec_stat = np.nan
        kupiec_pval = np.nan
    else:
        lr = -2 * (
            np.log((1 - alpha)**(n - x) * alpha**x)
            - np.log((1 - x/n)**(n - x) * (x/n)**x)
        )
        kupiec_pval = 1 - stats.chi2.cdf(lr, df=1)
        kupiec_stat = lr

    result = {
        'n_forecasts': n,
        'n_violations': int(n_violations),
        'violation_rate': violation_rate,
        'expected_rate': expected_rate,
        'kupiec_stat': kupiec_stat,
        'kupiec_pval': kupiec_pval,
        'model_accepted': kupiec_pval > 0.05 if not np.isnan(kupiec_pval) else None,
    }

    print(f"\nBacktesting results:")
    print(f"  Expected breach rate: {expected_rate:.2%}")
    print(f"  Actual breach rate: {violation_rate:.2%}")
    print(f"  Breach count: {n_violations}/{n}")
    print(f"  Kupiec LR statistic: {kupiec_stat:.4f}")
    print(f"  Kupiec p-value: {kupiec_pval:.4f}")
    print(f"  Model accepted: {'Yes' if result['model_accepted'] else 'No'}")

    return result

bt_result = backtest_var(
    returns,
    model_type='Garch',
    p=1, o=1, q=1,   # GJR-GARCH
    dist='t',
    window=1000,
    alpha=0.01
)

If the Kupiec test p-value is 0.05 or above, the model's VaR prediction is judged to be statistically accurate. If the p-value is below 0.05, the VaR breach frequency significantly differs from the theoretical level, meaning the model needs to be reviewed.

Backtesting Interpretation Guide

Situation	Meaning	Response
Actual breach rate higher than expected	VaR underestimation (risk underestimation)	More conservative distribution (lower t df) or asymmetric model
Actual breach rate lower than expected	VaR overestimation (capital waste)	Try normal distribution or higher df t-distribution
Breaches occur in clusters	Conditional model fails to capture volatility regime changes	Markov regime-switching model or window reduction
Kupiec passes, Christoffersen fails	Overall frequency correct but temporal independence violated	Reconfigure mean or variance equation

Troubleshooting

Common Issues and Solutions

Issue 1: Optimization Convergence Failure

ConvergenceWarning: Maximum number of iterations has been exceeded.

The cause is usually poor initial values or data scale issues.

Solution:

# 1. Check return scale (should be percentage, not decimal)
returns_pct = returns * 100  # decimal -> percentage

# 2. Try different optimization method
result = model.fit(
    disp='off',
    options={'maxiter': 500},
    starting_values=np.array([0.01, 0.05, 0.90])  # omega, alpha, beta
)

Issue 2: Negative Variance Estimation

Rare in GARCH(1,1) due to parameter constraints, but can occur in non-standard models. Switching to EGARCH automatically resolves this since it models log variance.

Issue 3: alpha + beta equals or exceeds 1

Non-stationary GARCH means the long-run unconditional variance is undefined. The cause is usually structural breaks included in the data.

Solution:

# Reduce estimation period (use only data after structural break)
# e.g.: Use only post-COVID data
returns_post_covid = returns['2020-07-01':]
am = arch_model(returns_post_covid, vol='Garch', p=1, q=1,
                mean='Constant', dist='t')
result = am.fit(disp='off')
persistence = result.params['alpha[1]'] + result.params['beta[1]']
print(f"Persistence: {persistence:.4f}")

Issue 4: Forecast Volatility Converges to a Constant

GARCH multi-step forecasts converging quickly to the long-run unconditional variance is normal behavior. Convergence speed is determined by alpha + beta. For long-horizon forecasts, consider EWMA or implied volatility-based approaches over GARCH.

Issue 5: Unstable Student-t Degrees of Freedom Estimation

# Fix degrees of freedom
am = arch_model(returns, vol='Garch', p=1, q=1,
                mean='Constant', dist='t')
# Fix df (e.g., nu=5)
constraints = {'nu': 5.0}
result = am.fit(disp='off')
# Or use GED distribution as alternative
am_ged = arch_model(returns, vol='Garch', p=1, q=1,
                    mean='Constant', dist='ged')
result_ged = am_ged.fit(disp='off')

Operations Checklist

Model Construction Phase

Confirm return data is a stationary process (ADF test p-value under 0.05)
Confirm ARCH effects exist (Engle LM test p-value under 0.05)
Check kurtosis and skewness of return distribution (normality assessment)
Distribution selection: Student-t if high kurtosis, Skewed-t if skewed
Fit 2-3 model candidates (GARCH, GJR-GARCH, EGARCH)
Sort candidates by information criteria (AIC/BIC)
Perform residual diagnostics on optimal model (Ljung-Box, ARCH-LM)

Forecasting and Application Phase

Out-of-sample predictive power validation (rolling window)
VaR backtesting performed (Kupiec test pass/fail)
Verify annualization unit conversion of forecast volatility
Reflect position size and holding period in VaR amount calculation
Report CVaR (Expected Shortfall) alongside

Operations and Monitoring Phase

Determine model re-estimation frequency (weekly/monthly)
Verify missing value and outlier handling logic in data pipeline
Simulate model behavior under extreme market conditions
Implement structural break detection logic
Build alert system for VaR breach events
Generate quarterly model performance review reports

Failure Cases and Lessons

Case 1: VaR Underestimation Due to Normal Distribution Assumption

A risk management team calculated 99% VaR using GARCH(1,1)-Normal. Backtesting showed the VaR breach rate at 3.2% instead of the expected 1%. The cause was that the normal distribution's tails are thinner than the actual return distribution.

Lesson: Do not use normal distribution as the default when fitting GARCH to financial returns. Use Student-t distribution as the default, and compare with normal distribution using BIC. Empirically, Student-t is almost always superior.

# Bad example: normal distribution assumption
bad_model = arch_model(returns, vol='Garch', p=1, q=1,
                       mean='Constant', dist='normal')

# Good example: Student-t distribution default
good_model = arch_model(returns, vol='Garch', p=1, q=1,
                        mean='Constant', dist='t')

Case 2: Long-Term Data Ignoring Structural Breaks

Fitting GARCH to 16 years of data from the 2008 financial crisis through 2024 resulted in alpha + beta of 0.999. This is close to IGARCH (variance with unit root), making the volatility forecast effectively "current level persists forever," rendering it meaningless.

Lesson: Excessively long data containing structural breaks (financial crises, pandemics, etc.) distorts parameter estimation. A 3-5 year window is the practical default, using only data after structural breaks. Bai-Perron or CUSUM tests can detect structural break points.

Case 3: Confusing Volatility Forecasting with Return Forecasting

A quant trader ran a strategy of selling when GARCH conditional variance was high and buying when low. The result was persistent underperformance vs the market.

Lesson: GARCH is a model for forecasting volatility (risk), not returns (direction). High volatility also means the possibility of large upside. The correct application is to use GARCH output for position sizing (reduce position when volatility is high) or risk management.

Case 4: Applying Daily GARCH to Intraday Data

Standard GARCH(1,1) was applied to 5-minute bar data, but intraday patterns (volatility spike after market open, dip at lunch, rise before close) remained entirely in the residuals. All Ljung-Box tests were rejected, making the model meaningless.

Lesson: Intraday periodicity must be removed first from intraday data. Use Andersen and Bollerslev's (1997) FFF (Flexible Fourier Form) filter to remove intraday patterns before applying GARCH, or use the HAR-RV model that directly models Realized Volatility.

Case 5: Applying Univariate GARCH Separately to a Multivariate Portfolio

When measuring risk for a 5-asset portfolio, individual GARCH was fitted to each asset and portfolio variance was calculated using fixed correlations. During the COVID crash of March 2020, actual losses exceeded VaR by 4x.

Lesson: During crises, correlations between assets spike sharply. The sum of univariate GARCH models cannot capture this dynamic correlation. DCC-GARCH (Dynamic Conditional Correlation) or CCC-GARCH should be used to model time-varying correlations together. In Python, the arch package still has limited multivariate model support, so R's rmgarch package must be used in parallel or implemented directly.

Advanced Topics: Next Steps in Volatility Forecasting

Realized Volatility and HAR-RV Model

While GARCH assumes a parametric structure for returns, Realized Volatility measures volatility non-parametrically from intraday high-frequency data. Corsi's (2009) HAR-RV model combines autoregressive structures of daily, weekly, and monthly realized volatility, demonstrating excellent predictive power.

Integration with Machine Learning

Recent research has focused on hybrid approaches that use GARCH conditional variance as features for deep learning models like LSTM and Transformer. Combining the information GARCH captures from linear structures with the non-linear patterns neural networks capture can improve predictive performance over standalone use. However, due to the low signal-to-noise ratio (SNR) in financial data, strict cross-validation is essential given the high risk of overfitting.

Regime-Switching GARCH (Markov-Switching GARCH)

This model assumes volatility switches between two (or more) regimes: high-volatility and low-volatility. It is advantageous for capturing extreme market regime transitions like the 2008 financial crisis and 2020 COVID. However, it has drawbacks of complex parameter estimation and arbitrariness in choosing the number of regimes.

References

arch package official documentation - The standard library for Python GARCH modeling. API reference and example code are thorough. https://arch.readthedocs.io/en/latest/
Machine Learning Mastery - ARCH and GARCH Models - An introductory guide explaining ARCH/GARCH concepts and Python implementation step by step. https://machinelearningmastery.com/develop-arch-and-garch-models-for-time-series-forecasting-in-python/
QuantInsti - GARCH and GJR-GARCH Volatility Forecasting - A practical guide on volatility forecasting and trading application using GJR-GARCH. https://blog.quantinsti.com/garch-gjr-garch-volatility-forecasting-python/
Forecastegy - Volatility Forecasting in Python - Comparison and Python implementation of various volatility forecasting methodologies. https://forecastegy.com/posts/volatility-forecasting-in-python/
DataCamp - GARCH Models in Python - A systematic learning course on GARCH models with well-structured curriculum from concepts to practical application. https://www.datacamp.com/courses/garch-models-in-python
Bollerslev, T. (1986) - "Generalized Autoregressive Conditional Heteroskedasticity." Journal of Econometrics, 31(3), 307-327. The original GARCH paper.
Engle, R.F. (1982) - "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation." Econometrica, 50(4), 987-1007. The original ARCH paper and Nobel Prize-winning research.
Hansen, P.R. and Lunde, A. (2005) - "A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?" Journal of Applied Econometrics. A study comparing hundreds of volatility models concluding that GARCH(1,1) is surprisingly robust.