Split View: AI 금융 & 퀀트 트레이딩: FinBERT, 강화학습, 백테스팅까지
AI 금융 & 퀀트 트레이딩: FinBERT, 강화학습, 백테스팅까지
- 1. 금융 데이터 수집과 전처리
- 2. 기술적 분석 자동화
- 3. ML 트레이딩 전략: XGBoost 알파 팩터
- 4. 딥러닝 금융: LSTM과 Temporal Fusion Transformer
- 5. LLM for Finance: FinBERT 감성 분석
- 6. 리스크 관리: VaR, CVaR, 켈리 기준
- 7. 백테스팅: Vectorbt로 전략 검증
- 퀴즈
1. 금융 데이터 수집과 전처리
퀀트 트레이딩의 출발점은 데이터입니다. OHLCV(시가·고가·저가·종가·거래량) 외에도 오더북 스냅샷, 틱 데이터, 뉴스·위성 이미지 같은 대안 데이터까지 활용 범위가 확장되고 있습니다.
yfinance로 주식 데이터 다운로드
import yfinance as yf
import pandas as pd
# 여러 종목 OHLCV 일봉 데이터 수집
tickers = ["AAPL", "MSFT", "GOOGL", "NVDA"]
df = yf.download(tickers, start="2020-01-01", end="2026-01-01", auto_adjust=True)
# MultiIndex → 종목별 단일 DataFrame으로 변환
close = df["Close"]
volume = df["Volume"]
# 결측치 전처리: 앞값으로 채운 후 첫 날짜 이전 행 제거
close = close.ffill().dropna()
print(close.tail())
ccxt로 암호화폐 오더북 수집
import ccxt
import time
exchange = ccxt.binance()
symbol = "BTC/USDT"
orderbook = exchange.fetch_order_book(symbol, limit=20)
bids = orderbook["bids"][:5] # [가격, 수량] 상위 5개 매수호가
asks = orderbook["asks"][:5] # [가격, 수량] 상위 5개 매도호가
mid_price = (bids[0][0] + asks[0][0]) / 2
spread_bps = (asks[0][0] - bids[0][0]) / mid_price * 10000
print(f"Mid: {mid_price:.2f}, Spread: {spread_bps:.2f} bps")
대안 데이터: 뉴스 헤드라인 수집
뉴스·SNS 데이터는 정형 데이터로 포착되지 않는 **자연어 알파(Natural Language Alpha)**를 제공합니다.
import requests
from datetime import datetime, timedelta
API_KEY = "YOUR_NEWSAPI_KEY"
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
url = (
f"https://newsapi.org/v2/everything"
f"?q=NVIDIA+earnings&from={yesterday}&sortBy=publishedAt"
f"&language=en&apiKey={API_KEY}"
)
resp = requests.get(url).json()
headlines = [art["title"] for art in resp.get("articles", [])]
print(headlines[:5])
2. 기술적 분석 자동화
TA-Lib과 pandas-ta를 활용하면 수백 개의 기술적 지표를 파이썬 한 줄로 계산할 수 있습니다.
RSI / MACD 계산
import talib
import numpy as np
import yfinance as yf
df = yf.download("AAPL", start="2023-01-01", end="2026-01-01", auto_adjust=True)
close = df["Close"].squeeze().values.astype(float)
# RSI (14일)
rsi = talib.RSI(close, timeperiod=14)
# MACD
macd, signal, hist = talib.MACD(close, fastperiod=12, slowperiod=26, signalperiod=9)
# pandas-ta 방식 (TA-Lib 없이 사용 가능)
import pandas_ta as ta
df_ta = df["Close"].squeeze().to_frame("close")
df_ta.ta.rsi(length=14, append=True)
df_ta.ta.macd(fast=12, slow=26, signal=9, append=True)
print(df_ta.tail())
패턴 인식: 캔들스틱
# TA-Lib 캔들스틱 패턴 인식 예시
open_p = df["Open"].squeeze().values.astype(float)
high_p = df["High"].squeeze().values.astype(float)
low_p = df["Low"].squeeze().values.astype(float)
close_p = df["Close"].squeeze().values.astype(float)
hammer = talib.CDLHAMMER(open_p, high_p, low_p, close_p)
engulfing = talib.CDLENGULFING(open_p, high_p, low_p, close_p)
morning_star = talib.CDLMORNINGSTAR(open_p, high_p, low_p, close_p)
# 100 또는 -100 (강세/약세 패턴 감지), 0은 미해당
print("Hammer signals:", (hammer != 0).sum())
3. ML 트레이딩 전략: XGBoost 알파 팩터
피처 엔지니어링
import pandas as pd
import numpy as np
import yfinance as yf
import pandas_ta as ta
df = yf.download("SPY", start="2018-01-01", end="2026-01-01", auto_adjust=True)
df.columns = df.columns.droplevel(1) if df.columns.nlevels > 1 else df.columns
df.columns = [c.lower() for c in df.columns]
# 수익률 피처
df["ret_1d"] = df["close"].pct_change(1)
df["ret_5d"] = df["close"].pct_change(5)
df["ret_20d"] = df["close"].pct_change(20)
# 변동성 피처
df["vol_20d"] = df["ret_1d"].rolling(20).std()
# 기술 지표 피처
df.ta.rsi(length=14, append=True)
df.ta.macd(fast=12, slow=26, signal=9, append=True)
df.ta.bbands(length=20, append=True)
# 볼륨 피처
df["vol_ratio"] = df["volume"] / df["volume"].rolling(20).mean()
# 타깃: 다음 5일 수익률 부호 (1: 상승, 0: 하락)
df["target"] = (df["close"].pct_change(5).shift(-5) > 0).astype(int)
df.dropna(inplace=True)
print(df.shape)
Walk-Forward 검증으로 XGBoost 학습
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
import warnings
warnings.filterwarnings("ignore")
feature_cols = [
"ret_1d", "ret_5d", "ret_20d", "vol_20d",
"RSI_14", "MACD_12_26_9", "MACDs_12_26_9",
"BBL_20_2.0", "BBM_20_2.0", "BBU_20_2.0",
"vol_ratio"
]
target_col = "target"
# Walk-Forward: 1년 훈련 → 3개월 테스트 슬라이딩
results = []
train_years = 2
test_months = 3
dates = df.index
start_year = dates[0].year + train_years
for year in range(start_year, 2026):
for q in range(1, 5):
train_end = pd.Timestamp(f"{year}-{(q-1)*3+1:02d}-01") if q > 1 else pd.Timestamp(f"{year}-01-01")
test_start = train_end
test_end = test_start + pd.DateOffset(months=test_months)
train_df = df[df.index < test_start].tail(504) # 약 2년치
test_df = df[(df.index >= test_start) & (df.index < test_end)]
if len(train_df) < 100 or len(test_df) < 10:
continue
X_train, y_train = train_df[feature_cols], train_df[target_col]
X_test, y_test = test_df[feature_cols], test_df[target_col]
model = XGBClassifier(n_estimators=200, max_depth=4,
learning_rate=0.05, subsample=0.8,
eval_metric="logloss", random_state=42)
model.fit(X_train, y_train)
preds = model.predict(X_test)
proba = model.predict_proba(X_test)[:, 1]
acc = accuracy_score(y_test, preds)
auc = roc_auc_score(y_test, proba)
results.append({"period": str(test_start.date()), "acc": acc, "auc": auc})
result_df = pd.DataFrame(results)
print(result_df.tail(8))
print(f"\n평균 AUC: {result_df['auc'].mean():.4f}")
4. 딥러닝 금융: LSTM과 Temporal Fusion Transformer
LSTM 주가 예측
import numpy as np
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler
# 데이터 준비 (종가 정규화)
scaler = MinMaxScaler()
scaled = scaler.fit_transform(df[["close"]].values)
SEQ_LEN = 60
def make_sequences(data, seq_len):
X, y = [], []
for i in range(len(data) - seq_len):
X.append(data[i:i+seq_len])
y.append(data[i+seq_len])
return np.array(X), np.array(y)
X, y = make_sequences(scaled, SEQ_LEN)
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
class LSTMModel(nn.Module):
def __init__(self, input_size=1, hidden_size=64, num_layers=2):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=0.2)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
out, _ = self.lstm(x)
return self.fc(out[:, -1, :])
model = LSTMModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()
for epoch in range(30):
model.train()
pred = model(X_train_t)
loss = criterion(pred, y_train_t)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}, Loss: {loss.item():.6f}")
FinRL 강화학습 트레이딩 에이전트 (설정 예시)
FinRL은 OpenAI Gym 환경을 기반으로 주식 트레이딩 강화학습 에이전트를 학습시키는 프레임워크입니다.
# pip install finrl
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
import pandas as pd
# 전처리된 금융 데이터 (FinRL 형식)
# 필수 컬럼: date, tic, open, high, low, close, volume + 기술지표
processed_df = pd.read_csv("processed_stock_data.csv")
# 환경 설정
env_kwargs = {
"hmax": 100, # 최대 보유 주식 수
"initial_amount": 100000, # 초기 자금 (달러)
"buy_cost_pct": 0.001, # 거래 수수료 0.1%
"sell_cost_pct": 0.001,
"reward_scaling": 1e-4,
"state_space": 181,
"action_space": 30,
"tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"],
}
train_env = StockTradingEnv(df=processed_df, **env_kwargs)
# PPO 에이전트 학습
agent = DRLAgent(env=train_env)
model_ppo = agent.get_model("ppo")
trained_ppo = agent.train_model(
model=model_ppo,
tb_log_name="ppo_stock",
total_timesteps=50000
)
5. LLM for Finance: FinBERT 감성 분석
FinBERT는 금융 뉴스·실적 발표에 특화된 BERT 사전학습 모델로, Positive / Negative / Neutral 3-class 분류를 수행합니다.
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import torch.nn.functional as F
model_name = "ProsusAI/finbert"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()
def finbert_sentiment(texts):
inputs = tokenizer(texts, padding=True, truncation=True,
max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=-1).numpy()
labels = ["positive", "negative", "neutral"]
return [
{"text": t, "label": labels[p.argmax()], "score": float(p.max())}
for t, p in zip(texts, probs)
]
headlines = [
"NVIDIA beats Q4 earnings estimates by 15%, raises guidance",
"Fed signals higher-for-longer rates amid sticky inflation",
"Apple reports record services revenue despite iPhone slowdown",
]
results = finbert_sentiment(headlines)
for r in results:
print(f"[{r['label'].upper():8s}] {r['score']:.3f} | {r['text']}")
실적 발표 요약: 숫자/가이던스 vs 텍스트 톤 분리 분석
숫자(EPS, 매출) 및 가이던스 수치는 별도로 파싱한 후, 경영진의 텍스트 발언은 FinBERT로 감성 점수를 부여합니다. 두 신호를 결합하면 단순 텍스트 분석보다 정확한 알파를 생성할 수 있습니다.
6. 리스크 관리: VaR, CVaR, 켈리 기준
VaR / CVaR 계산
import numpy as np
import scipy.stats as stats
def calculate_var_cvar(returns, confidence=0.95):
"""
Historical VaR/CVaR 계산
returns: 일별 수익률 배열
"""
sorted_returns = np.sort(returns)
index = int((1 - confidence) * len(sorted_returns))
var = -sorted_returns[index]
cvar = -sorted_returns[:index].mean()
return var, cvar
daily_returns = df["close"].pct_change().dropna().values
var_95, cvar_95 = calculate_var_cvar(daily_returns, 0.95)
var_99, cvar_99 = calculate_var_cvar(daily_returns, 0.99)
print(f"VaR 95%: {var_95:.4f} ({var_95*100:.2f}%)")
print(f"CVaR 95%: {cvar_95:.4f} ({cvar_95*100:.2f}%)")
print(f"VaR 99%: {var_99:.4f} ({var_99*100:.2f}%)")
print(f"CVaR 99%: {cvar_99:.4f} ({cvar_99*100:.2f}%)")
주요 리스크 지표 비교표
| 지표 | 공식 | 특징 | 한계 |
|---|---|---|---|
| 샤프 비율 | (Rp - Rf) / σp | 표준화된 위험조정수익 | 상승 변동성도 패널티 |
| 소르티노 비율 | (Rp - Rf) / σd | 하방 변동성만 패널티 | 분모 계산 비직관적 |
| 최대 낙폭(MDD) | 최고점 대비 최대 손실 | 극단적 손실 포착 | 회복 기간 미반영 |
| VaR 95% | 하위 5% 손실 분위 | 규제 표준 | 꼬리 위험 과소평가 |
| CVaR 95% | VaR 초과 손실 기댓값 | 꼬리 위험 반영 | 모수 추정 민감도 |
| 칼마 비율 | CAGR / MDD | 낙폭 대비 성장성 | 단기 분석에 부적합 |
켈리 기준 포지션 사이징
def kelly_fraction(win_rate, win_loss_ratio):
"""
f* = W - (1-W)/R
W: 승률, R: 평균 손익비
"""
return win_rate - (1 - win_rate) / win_loss_ratio
# 예시: 승률 55%, 손익비 1.5
f_full = kelly_fraction(0.55, 1.5)
f_half = f_full * 0.5 # fractional Kelly (변동성 감소)
print(f"Full Kelly: {f_full:.2%}")
print(f"Half Kelly: {f_half:.2%}")
fractional Kelly(통상 0.25~0.5배)를 사용하는 이유: 실제 승률 추정 오차가 크고, 풀 켈리는 파산 확률을 과소평가하기 때문에 안전 마진을 적용합니다.
7. 백테스팅: Vectorbt로 전략 검증
Vectorbt 롱/숏 전략 백테스트
import vectorbt as vbt
import pandas as pd
import yfinance as yf
# 데이터 로드
price = yf.download("SPY", start="2018-01-01", end="2026-01-01",
auto_adjust=True)["Close"].squeeze()
# 이동평균 크로스오버 신호 생성
fast_ma = vbt.MA.run(price, 20)
slow_ma = vbt.MA.run(price, 60)
entries = fast_ma.ma_crossed_above(slow_ma)
exits = fast_ma.ma_crossed_below(slow_ma)
# 포트폴리오 시뮬레이션
portfolio = vbt.Portfolio.from_signals(
price,
entries,
exits,
init_cash=100_000,
fees=0.001, # 수수료 0.1%
slippage=0.001, # 슬리피지 0.1%
freq="D",
)
# 성과 지표 출력
stats = portfolio.stats()
print(stats[["Total Return [%]", "Sharpe Ratio", "Max Drawdown [%]",
"Win Rate [%]", "Profit Factor"]])
과적합 방지 체크리스트
백테스팅에서 낙관적 결과가 나올 때 반드시 점검해야 할 항목:
| 편향 종류 | 원인 | 대응 방법 |
|---|---|---|
| 룩어헤드 바이어스 | 미래 데이터로 현재 신호 계산 | shift(-1) 확인, 피처 생성 시점 검토 |
| 서바이버십 편향 | 상장폐지 종목 제외 | 전체 유니버스 데이터 사용 |
| 최적화 편향 | 인샘플 파라미터 과적합 | Walk-forward, WFO 테스트 |
| 시장 충격 무시 | 대량 주문의 가격 영향 무시 | 슬리피지 모델, 거래량 제한 |
| 거래비용 과소평가 | 실제 스프레드/수수료 미반영 | 현실적 수수료 + 슬리피지 설정 |
퀴즈
Q1. walk-forward 검증이 k-fold 교차검증보다 금융 시계열에 적합한 이유는?
정답: 시계열 데이터는 시간적 의존성(temporal dependency)이 있어 k-fold처럼 무작위 분할 시 미래 데이터가 훈련에 포함되는 룩어헤드 바이어스가 발생합니다.
설명: Walk-forward 검증은 항상 과거 데이터로 학습하고 미래 데이터로 테스트하는 시간 순서를 유지합니다. k-fold는 폴드 내에서 미래 시점의 수익률이 훈련 세트에 포함되어 모델이 미래 정보를 "기억"한 것처럼 성능이 부풀려집니다. 금융 시계열은 자기상관(autocorrelation)과 체제 전환(regime shift)이 있어 반드시 시간 순서를 보존한 검증이 필요합니다.
Q2. 샤프 비율의 한계와 소르티노 비율이 더 적절한 상황은?
정답: 샤프 비율은 상승·하강 변동성을 동등하게 패널티로 처리하므로, 수익이 높아서 변동성이 큰 전략을 불공정하게 낮게 평가합니다.
설명: 소르티노 비율은 분모를 하방 표준편차(downside deviation)로 대체해 투자자가 실제로 꺼리는 손실 방향의 변동성만 패널티로 부여합니다. 옵션 매도, 모멘텀 전략처럼 수익이 비대칭적으로 분포하거나 오른쪽 꼬리가 두꺼운 전략 평가에 소르티노 비율이 더 적합합니다.
Q3. 룩어헤드 바이어스(look-ahead bias)가 백테스팅 결과를 낙관적으로 왜곡하는 원리는?
정답: 신호를 생성할 때 해당 시점에는 존재하지 않았던 미래 데이터(당일 종가, 다음 분기 실적 등)가 피처나 레이블 계산에 포함되어 모델이 실제보다 훨씬 높은 예측 정확도를 보입니다.
설명: 예를 들어 당일 종가로 계산한 RSI를 당일 시가 체결 신호로 사용하면, 실제 거래에서는 알 수 없는 정보를 미리 사용하는 셈입니다. pandas의 shift(-n) 미처리, rolling 통계의 min_periods 설정 오류, 지수 이동평균의 미래 누수 등이 주요 원인입니다.
Q4. Kelly Criterion이 포지션 사이징에 최적인 수학적 근거와 fractional Kelly를 사용하는 이유는?
정답: 켈리 기준은 장기 기대 로그 수익(expected log return)을 최대화하는 베팅 비율을 수학적으로 도출합니다. f* = W - (1-W)/R 공식에서 W는 승률, R은 손익비입니다.
설명: 켈리 기준은 기하 평균 성장률을 최대화하므로 이론적으로 장기적으로 가장 빠르게 자산을 불릴 수 있습니다. 그러나 승률·손익비 추정 오차가 크면 과도한 레버리지로 큰 낙폭이 발생할 수 있습니다. 이를 보완하기 위해 실무에서는 full Kelly의 25~50%인 fractional Kelly를 사용해 파산 위험을 줄이고 변동성을 관리합니다.
Q5. LLM으로 실적 발표 감성 분석 시 숫자/가이던스와 텍스트 톤을 분리 분석해야 하는 이유는?
정답: 경영진이 수치는 좋게 발표하면서 미래 전망(가이던스)은 보수적으로 제시하거나, 반대로 나쁜 실적을 긍정적인 수사로 포장하는 경우가 빈번하여 혼합 분석 시 신호가 희석됩니다.
설명: 실적 발표는 (1) EPS·매출 등 정량 수치, (2) 미래 가이던스 수치, (3) 경영진 발언의 텍스트 톤이라는 세 가지 신호를 담고 있습니다. FinBERT 등 LLM은 텍스트 감성을 잘 포착하지만, "예상을 10% 하회"처럼 수치가 포함된 문장은 텍스트 모델이 오판할 수 있습니다. 숫자 파싱(regex/NLP 구조화)과 텍스트 감성 분석을 분리한 후 앙상블하면 더 강한 자연어 알파를 얻을 수 있습니다.
AI Finance & Quant Trading: FinBERT, Reinforcement Learning, and Backtesting
- 1. Financial Data Collection and Preprocessing
- 2. Technical Analysis Automation
- 3. ML Trading Strategy: XGBoost Alpha Factors
- 4. Deep Learning for Finance: LSTM and Temporal Fusion Transformer
- 5. LLM for Finance: FinBERT Sentiment Analysis
- 6. Risk Management: VaR, CVaR, and Kelly Criterion
- 7. Backtesting: Vectorbt Strategy Verification
- Quiz
1. Financial Data Collection and Preprocessing
The foundation of quantitative trading is data. Beyond OHLCV (Open, High, Low, Close, Volume), the scope now extends to order book snapshots, tick data, and alternative data such as news feeds and satellite imagery.
Downloading Stock Data with yfinance
import yfinance as yf
import pandas as pd
# Download daily OHLCV for multiple tickers
tickers = ["AAPL", "MSFT", "GOOGL", "NVDA"]
df = yf.download(tickers, start="2020-01-01", end="2026-01-01", auto_adjust=True)
# Flatten MultiIndex → per-ticker DataFrames
close = df["Close"]
volume = df["Volume"]
# Handle missing data: forward fill then drop leading NaNs
close = close.ffill().dropna()
print(close.tail())
Fetching Crypto Order Books with ccxt
import ccxt
exchange = ccxt.binance()
symbol = "BTC/USDT"
orderbook = exchange.fetch_order_book(symbol, limit=20)
bids = orderbook["bids"][:5] # top-5 [price, qty] bid levels
asks = orderbook["asks"][:5] # top-5 [price, qty] ask levels
mid_price = (bids[0][0] + asks[0][0]) / 2
spread_bps = (asks[0][0] - bids[0][0]) / mid_price * 10000
print(f"Mid: {mid_price:.2f}, Spread: {spread_bps:.2f} bps")
Alternative Data: News Headline Collection
News and social-media data provide natural language alpha that structured price data cannot capture.
import requests
from datetime import datetime, timedelta
API_KEY = "YOUR_NEWSAPI_KEY"
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
url = (
f"https://newsapi.org/v2/everything"
f"?q=NVIDIA+earnings&from={yesterday}&sortBy=publishedAt"
f"&language=en&apiKey={API_KEY}"
)
resp = requests.get(url).json()
headlines = [art["title"] for art in resp.get("articles", [])]
print(headlines[:5])
2. Technical Analysis Automation
TA-Lib and pandas-ta let you compute hundreds of technical indicators in a single Python call.
RSI / MACD Calculation
import talib
import numpy as np
import yfinance as yf
df = yf.download("AAPL", start="2023-01-01", end="2026-01-01", auto_adjust=True)
close = df["Close"].squeeze().values.astype(float)
# RSI (14-period)
rsi = talib.RSI(close, timeperiod=14)
# MACD
macd, signal, hist = talib.MACD(close, fastperiod=12, slowperiod=26, signalperiod=9)
# pandas-ta alternative (no TA-Lib dependency)
import pandas_ta as ta
df_ta = df["Close"].squeeze().to_frame("close")
df_ta.ta.rsi(length=14, append=True)
df_ta.ta.macd(fast=12, slow=26, signal=9, append=True)
print(df_ta.tail())
Candlestick Pattern Recognition
open_p = df["Open"].squeeze().values.astype(float)
high_p = df["High"].squeeze().values.astype(float)
low_p = df["Low"].squeeze().values.astype(float)
close_p = df["Close"].squeeze().values.astype(float)
hammer = talib.CDLHAMMER(open_p, high_p, low_p, close_p)
engulfing = talib.CDLENGULFING(open_p, high_p, low_p, close_p)
morning_star = talib.CDLMORNINGSTAR(open_p, high_p, low_p, close_p)
# Returns 100 (bullish) / -100 (bearish) / 0 (no pattern)
print("Hammer signals detected:", (hammer != 0).sum())
3. ML Trading Strategy: XGBoost Alpha Factors
Feature Engineering
import pandas as pd
import numpy as np
import yfinance as yf
import pandas_ta as ta
df = yf.download("SPY", start="2018-01-01", end="2026-01-01", auto_adjust=True)
df.columns = df.columns.droplevel(1) if df.columns.nlevels > 1 else df.columns
df.columns = [c.lower() for c in df.columns]
# Return features
df["ret_1d"] = df["close"].pct_change(1)
df["ret_5d"] = df["close"].pct_change(5)
df["ret_20d"] = df["close"].pct_change(20)
# Volatility feature
df["vol_20d"] = df["ret_1d"].rolling(20).std()
# Technical indicator features
df.ta.rsi(length=14, append=True)
df.ta.macd(fast=12, slow=26, signal=9, append=True)
df.ta.bbands(length=20, append=True)
# Volume feature
df["vol_ratio"] = df["volume"] / df["volume"].rolling(20).mean()
# Target: sign of 5-day forward return (1 = up, 0 = down)
df["target"] = (df["close"].pct_change(5).shift(-5) > 0).astype(int)
df.dropna(inplace=True)
print(df.shape)
XGBoost with Walk-Forward Validation
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
import warnings
warnings.filterwarnings("ignore")
feature_cols = [
"ret_1d", "ret_5d", "ret_20d", "vol_20d",
"RSI_14", "MACD_12_26_9", "MACDs_12_26_9",
"BBL_20_2.0", "BBM_20_2.0", "BBU_20_2.0",
"vol_ratio"
]
target_col = "target"
results = []
train_years = 2
test_months = 3
dates = df.index
start_year = dates[0].year + train_years
for year in range(start_year, 2026):
for q in range(1, 5):
train_end = pd.Timestamp(f"{year}-{(q-1)*3+1:02d}-01") if q > 1 else pd.Timestamp(f"{year}-01-01")
test_start = train_end
test_end = test_start + pd.DateOffset(months=test_months)
train_df = df[df.index < test_start].tail(504)
test_df = df[(df.index >= test_start) & (df.index < test_end)]
if len(train_df) < 100 or len(test_df) < 10:
continue
X_train, y_train = train_df[feature_cols], train_df[target_col]
X_test, y_test = test_df[feature_cols], test_df[target_col]
model = XGBClassifier(
n_estimators=200, max_depth=4,
learning_rate=0.05, subsample=0.8,
eval_metric="logloss", random_state=42
)
model.fit(X_train, y_train)
preds = model.predict(X_test)
proba = model.predict_proba(X_test)[:, 1]
acc = accuracy_score(y_test, preds)
auc = roc_auc_score(y_test, proba)
results.append({"period": str(test_start.date()), "acc": acc, "auc": auc})
result_df = pd.DataFrame(results)
print(result_df.tail(8))
print(f"\nMean AUC: {result_df['auc'].mean():.4f}")
4. Deep Learning for Finance: LSTM and Temporal Fusion Transformer
LSTM Price Prediction
import numpy as np
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled = scaler.fit_transform(df[["close"]].values)
SEQ_LEN = 60
def make_sequences(data, seq_len):
X, y = [], []
for i in range(len(data) - seq_len):
X.append(data[i:i+seq_len])
y.append(data[i+seq_len])
return np.array(X), np.array(y)
X, y = make_sequences(scaled, SEQ_LEN)
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
class LSTMModel(nn.Module):
def __init__(self, input_size=1, hidden_size=64, num_layers=2):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers,
batch_first=True, dropout=0.2)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
out, _ = self.lstm(x)
return self.fc(out[:, -1, :])
model = LSTMModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()
for epoch in range(30):
model.train()
pred = model(X_train_t)
loss = criterion(pred, y_train_t)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}, Loss: {loss.item():.6f}")
FinRL Reinforcement Learning Trading Agent
FinRL builds on an OpenAI Gym-style environment to train RL agents for stock trading.
# pip install finrl
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
import pandas as pd
# Preprocessed financial data in FinRL format
# Required columns: date, tic, open, high, low, close, volume + tech indicators
processed_df = pd.read_csv("processed_stock_data.csv")
env_kwargs = {
"hmax": 100, # max shares held per stock
"initial_amount": 100000, # starting capital ($)
"buy_cost_pct": 0.001, # 0.1% commission
"sell_cost_pct": 0.001,
"reward_scaling": 1e-4,
"state_space": 181,
"action_space": 30,
"tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"],
}
train_env = StockTradingEnv(df=processed_df, **env_kwargs)
agent = DRLAgent(env=train_env)
model_ppo = agent.get_model("ppo")
trained_ppo = agent.train_model(
model=model_ppo,
tb_log_name="ppo_stock",
total_timesteps=50000
)
5. LLM for Finance: FinBERT Sentiment Analysis
FinBERT is a BERT model pre-trained on financial news and earnings call transcripts. It classifies text into Positive, Negative, or Neutral.
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import torch.nn.functional as F
model_name = "ProsusAI/finbert"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()
def finbert_sentiment(texts):
inputs = tokenizer(texts, padding=True, truncation=True,
max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=-1).numpy()
labels = ["positive", "negative", "neutral"]
return [
{"text": t, "label": labels[p.argmax()], "score": float(p.max())}
for t, p in zip(texts, probs)
]
headlines = [
"NVIDIA beats Q4 earnings estimates by 15%, raises guidance",
"Fed signals higher-for-longer rates amid sticky inflation",
"Apple reports record services revenue despite iPhone slowdown",
]
results = finbert_sentiment(headlines)
for r in results:
print(f"[{r['label'].upper():8s}] {r['score']:.3f} | {r['text']}")
Separating Numeric Guidance from Text Tone
Earnings calls contain three distinct signals: (1) quantitative results such as EPS and revenue, (2) forward guidance figures, and (3) management tone in the spoken text. LLMs excel at capturing tone but can misread sentences that embed raw numbers. Parsing numbers via regex or structured NLP and combining them with FinBERT tone scores in an ensemble produces stronger natural-language alpha than either approach alone.
6. Risk Management: VaR, CVaR, and Kelly Criterion
VaR / CVaR Computation
import numpy as np
def calculate_var_cvar(returns, confidence=0.95):
"""
Historical simulation VaR and CVaR.
returns: array of daily returns
"""
sorted_returns = np.sort(returns)
index = int((1 - confidence) * len(sorted_returns))
var = -sorted_returns[index]
cvar = -sorted_returns[:index].mean()
return var, cvar
daily_returns = df["close"].pct_change().dropna().values
var_95, cvar_95 = calculate_var_cvar(daily_returns, 0.95)
var_99, cvar_99 = calculate_var_cvar(daily_returns, 0.99)
print(f"VaR 95%: {var_95:.4f} ({var_95*100:.2f}%)")
print(f"CVaR 95%: {cvar_95:.4f} ({cvar_95*100:.2f}%)")
print(f"VaR 99%: {var_99:.4f} ({var_99*100:.2f}%)")
print(f"CVaR 99%: {cvar_99:.4f} ({cvar_99*100:.2f}%)")
Key Risk Metric Comparison
| Metric | Formula | Strength | Limitation |
|---|---|---|---|
| Sharpe Ratio | (Rp - Rf) / sigma_p | Standardized risk-adjusted return | Penalizes upside volatility equally |
| Sortino Ratio | (Rp - Rf) / sigma_d | Penalizes only downside vol | Denominator less intuitive |
| Max Drawdown | Peak-to-trough loss | Captures extreme losses | Ignores recovery duration |
| VaR 95% | 5th percentile loss | Regulatory standard | Underestimates tail risk |
| CVaR 95% | Expected loss beyond VaR | Captures tail risk | Sensitive to distributional assumptions |
| Calmar Ratio | CAGR / MDD | Growth vs. drawdown | Less meaningful for short periods |
Kelly Criterion Position Sizing
def kelly_fraction(win_rate, win_loss_ratio):
"""
f* = W - (1 - W) / R
W: win rate, R: average win/loss ratio
"""
return win_rate - (1 - win_rate) / win_loss_ratio
# Example: 55% win rate, 1.5 win/loss ratio
f_full = kelly_fraction(0.55, 1.5)
f_half = f_full * 0.5 # fractional Kelly reduces variance
print(f"Full Kelly: {f_full:.2%}")
print(f"Half Kelly: {f_half:.2%}")
Fractional Kelly (typically 0.25–0.5x) is used in practice because win-rate and edge estimates carry significant estimation error. Full Kelly can produce catastrophic drawdowns when inputs are off, so a safety margin is essential.
7. Backtesting: Vectorbt Strategy Verification
Moving Average Crossover Backtest with Vectorbt
import vectorbt as vbt
import pandas as pd
import yfinance as yf
price = yf.download("SPY", start="2018-01-01", end="2026-01-01",
auto_adjust=True)["Close"].squeeze()
fast_ma = vbt.MA.run(price, 20)
slow_ma = vbt.MA.run(price, 60)
entries = fast_ma.ma_crossed_above(slow_ma)
exits = fast_ma.ma_crossed_below(slow_ma)
portfolio = vbt.Portfolio.from_signals(
price,
entries,
exits,
init_cash=100_000,
fees=0.001, # 0.1% commission
slippage=0.001, # 0.1% slippage
freq="D",
)
stats = portfolio.stats()
print(stats[["Total Return [%]", "Sharpe Ratio", "Max Drawdown [%]",
"Win Rate [%]", "Profit Factor"]])
Backtesting Bias Checklist
When a backtest produces unexpectedly strong results, always audit these failure modes:
| Bias Type | Root Cause | Mitigation |
|---|---|---|
| Look-ahead bias | Future data used to compute current-period signals | Audit shift(-1) calls; check feature timestamps |
| Survivorship bias | Delisted tickers excluded from universe | Use point-in-time universe datasets |
| Optimization bias | In-sample parameter over-fitting | Walk-forward validation, out-of-sample holdout |
| Market impact ignored | Large orders assumed to fill at mid price | Slippage model; volume-constrained sizing |
| Underestimated transaction costs | Real spreads and fees excluded | Realistic commission + slippage parameters |
Quiz
Q1. Why is walk-forward validation more appropriate for financial time series than k-fold cross-validation?
Answer: Financial time series have temporal dependency; randomly splitting folds allows future data to leak into training, creating look-ahead bias that inflates apparent model performance.
Explanation: Walk-forward validation always trains on past data and tests on future data, preserving temporal order. In k-fold, a training fold can contain observations that occur after some test observations, meaning the model effectively "knows the future." Financial returns also exhibit autocorrelation and regime shifts, making temporal ordering of validation essential.
Q2. What are the limitations of the Sharpe ratio, and when is the Sortino ratio more appropriate?
Answer: The Sharpe ratio treats upside and downside volatility identically. A strategy with large positive return spikes is penalized unfairly, making its Sharpe ratio look worse than it deserves.
Explanation: The Sortino ratio replaces the denominator with downside deviation, penalizing only losses that investors actually dislike. It is more appropriate for strategies with asymmetric return distributions — such as momentum, option writing, or trend-following — where upside variance is desirable and should not reduce the risk-adjusted score.
Q3. How does look-ahead bias cause backtest results to be overly optimistic?
Answer: When signals are computed using data that did not exist at the time of the trade — such as the same bar's closing price used to trigger an open-bar entry — the model implicitly knows the future and records artificially high accuracy.
Explanation: Common sources include: using the daily close to generate a same-day entry signal, failing to apply shift(-n) when labeling forward returns, rolling statistics that include the current bar, and exponential moving averages that back-propagate future information. Each instance makes the strategy appear to predict what it actually already observed.
Q4. What is the mathematical justification for Kelly Criterion in position sizing, and why use fractional Kelly?
Answer: The Kelly formula f* = W - (1 - W) / R maximizes the expected log return, which is equivalent to maximizing the long-run geometric growth rate of wealth.
Explanation: By maximizing E[log(wealth)], Kelly provably grows capital faster than any other fixed-fraction strategy over the long run. However, the formula is sensitive to estimation error in W (win rate) and R (win/loss ratio). Overestimating edge leads to over-betting and severe drawdowns. Fractional Kelly (25–50% of f*) sacrifices some asymptotic growth rate for dramatically reduced variance and drawdown, making it far more practical for live trading.
Q5. Why should numeric guidance and text tone be analyzed separately when using LLMs on earnings calls?
Answer: Management often delivers strong headline numbers while guiding conservatively for the next quarter, or presents weak results in reassuring language. Mixing the two signals causes them to cancel out, diluting alpha.
Explanation: Earnings releases contain three distinct information types: (1) realized figures such as EPS and revenue, (2) forward guidance numbers, and (3) qualitative tone in management commentary. LLMs like FinBERT accurately score tone but can misclassify a sentence like "revenue missed by 8%" as neutral or positive depending on surrounding context. Parsing numeric figures with structured extraction and scoring text tone separately — then combining them in a weighted ensemble — produces more accurate and robust natural-language alpha signals.