Split View: 파이썬 알고리즘 트레이딩 실전 가이드: 백테스팅 프레임워크·전략 개발·리스크 관리
파이썬 알고리즘 트레이딩 실전 가이드: 백테스팅 프레임워크·전략 개발·리스크 관리
- 들어가며
- 알고리즘 트레이딩 개요
- 백테스팅 프레임워크 비교
- 데이터 수집
- 전략 구현
- 리스크 관리 메트릭
- 포지션 사이징
- 워크포워드 최적화
- 트러블슈팅: 일반적인 함정
- 실전 트레이딩 고려사항
- 운영 노트
- 프로덕션 체크리스트
- 참고자료

들어가며
알고리즘 트레이딩(Algorithmic Trading)은 사전에 정의된 규칙에 따라 자동으로 매매를 실행하는 체계적 투자 방식이다. 감정에 휘둘리지 않고 일관된 전략을 실행할 수 있다는 장점이 있지만, 잘못 설계된 전략은 실시간 시장에서 큰 손실을 초래할 수 있다.
파이썬은 풍부한 금융 라이브러리 생태계, 데이터 분석 도구, 그리고 쉬운 프로토타이핑 덕분에 퀀트 트레이딩의 주요 언어로 자리 잡았다. 이 글에서는 백테스팅 프레임워크 선택부터 전략 개발, 리스크 관리, 그리고 실전 트레이딩까지 알고리즘 트레이딩의 전체 파이프라인을 다룬다.
주의: 이 글은 교육 목적으로 작성되었으며, 투자 조언이 아닙니다. 실제 트레이딩에는 상당한 리스크가 따릅니다.
알고리즘 트레이딩 개요
체계적 트레이딩 vs 재량적 트레이딩
| 구분 | 체계적(Systematic) | 재량적(Discretionary) |
|---|---|---|
| 의사결정 | 알고리즘/규칙 기반 | 인간의 판단 |
| 감정 영향 | 없음 | 높음 |
| 속도 | 밀리초 단위 실행 가능 | 수초~수분 |
| 확장성 | 수천 종목 동시 관리 가능 | 제한적 |
| 백테스팅 | 체계적 검증 가능 | 주관적 평가 |
| 적응성 | 규칙 변경 필요 | 유연한 대응 |
| 개발 비용 | 초기 투자 높음 | 낮음 |
알고리즘 트레이딩 파이프라인
- 데이터 수집: 시장 데이터 확보 (가격, 거래량, 재무 데이터)
- 전략 개발: 매매 신호 로직 설계
- 백테스팅: 과거 데이터로 전략 검증
- 최적화: 파라미터 튜닝 및 워크포워드 검증
- 리스크 관리: 포지션 사이징, 손절/익절 설정
- 실전 배포: 페이퍼 트레이딩 후 실제 자금 운용
백테스팅 프레임워크 비교
| 프레임워크 | 속도 | 사용 편의성 | 기능 범위 | 라이브 트레이딩 | 커뮤니티 |
|---|---|---|---|---|---|
| Backtesting.py | 빠름 | 매우 쉬움 | 기본적 | 미지원 | 보통 |
| Zipline | 보통 | 보통 | 광범위 | 제한적 | 활발 |
| vectorbt | 매우 빠름 | 보통 | 고급 분석 | 미지원 | 활발 |
| Backtrader | 보통 | 보통 | 매우 광범위 | 지원 (IB) | 활발 |
| QuantConnect | 빠름 | 쉬움 | 매우 광범위 | 완전 지원 | 매우 활발 |
프레임워크 선택 가이드
- 빠른 프로토타이핑: Backtesting.py (코드 몇 줄로 전략 검증)
- 대규모 벡터 연산: vectorbt (NumPy 기반 고속 처리)
- 실전 트레이딩 연동: Backtrader (Interactive Brokers 연동)
- 클라우드 기반 통합 환경: QuantConnect (데이터+실행 통합)
데이터 수집
yfinance를 활용한 시장 데이터 수집
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
def fetch_market_data(
tickers: list,
start_date: str = "2020-01-01",
end_date: str = None,
interval: str = "1d",
) -> dict:
"""시장 데이터 수집"""
if end_date is None:
end_date = datetime.now().strftime("%Y-%m-%d")
data = {}
for ticker in tickers:
try:
df = yf.download(
ticker,
start=start_date,
end=end_date,
interval=interval,
progress=False,
)
if not df.empty:
data[ticker] = df
print(f"{ticker}: {len(df)} rows loaded ({df.index[0]} ~ {df.index[-1]})")
else:
print(f"{ticker}: No data available")
except Exception as e:
print(f"{ticker}: Error - {e}")
return data
# 데이터 수집
tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "SPY"]
market_data = fetch_market_data(tickers, start_date="2020-01-01")
# 데이터 확인
aapl = market_data["AAPL"]
print(f"\nAAPL 데이터 요약:")
print(f" 기간: {aapl.index[0]} ~ {aapl.index[-1]}")
print(f" 데이터 포인트: {len(aapl)}")
print(f" 컬럼: {list(aapl.columns)}")
Alpha Vantage API 활용
import requests
import pandas as pd
class AlphaVantageClient:
"""Alpha Vantage API 클라이언트"""
BASE_URL = "https://www.alphavantage.co/query"
def __init__(self, api_key: str):
self.api_key = api_key
def get_daily(self, symbol: str, outputsize: str = "full") -> pd.DataFrame:
"""일봉 데이터 조회"""
params = {
"function": "TIME_SERIES_DAILY_ADJUSTED",
"symbol": symbol,
"outputsize": outputsize,
"apikey": self.api_key,
}
response = requests.get(self.BASE_URL, params=params)
data = response.json()
if "Time Series (Daily)" not in data:
raise ValueError(f"API error: {data.get('Note', data.get('Error Message', 'Unknown'))}")
df = pd.DataFrame.from_dict(data["Time Series (Daily)"], orient="index")
df.columns = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Dividend", "Split"]
df = df.astype(float)
df.index = pd.to_datetime(df.index)
df = df.sort_index()
return df
def get_intraday(self, symbol: str, interval: str = "5min") -> pd.DataFrame:
"""분봉 데이터 조회"""
params = {
"function": "TIME_SERIES_INTRADAY",
"symbol": symbol,
"interval": interval,
"outputsize": "full",
"apikey": self.api_key,
}
response = requests.get(self.BASE_URL, params=params)
data = response.json()
time_series_key = f"Time Series ({interval})"
if time_series_key not in data:
raise ValueError(f"API error: {data}")
df = pd.DataFrame.from_dict(data[time_series_key], orient="index")
df.columns = ["Open", "High", "Low", "Close", "Volume"]
df = df.astype(float)
df.index = pd.to_datetime(df.index)
df = df.sort_index()
return df
# 사용 예시
# client = AlphaVantageClient(api_key="YOUR_API_KEY")
# daily_data = client.get_daily("AAPL")
전략 구현
전략 1: 이동평균 교차 (Moving Average Crossover)
import pandas as pd
import numpy as np
from backtesting import Backtest, Strategy
from backtesting.lib import crossover
class MovingAverageCrossover(Strategy):
"""이동평균 교차 전략
- 단기 이동평균이 장기 이동평균을 상향 돌파하면 매수
- 단기 이동평균이 장기 이동평균을 하향 돌파하면 매도
"""
fast_period = 10 # 단기 이동평균 기간
slow_period = 30 # 장기 이동평균 기간
def init(self):
close = self.data.Close
self.fast_ma = self.I(lambda x: pd.Series(x).rolling(self.fast_period).mean(), close)
self.slow_ma = self.I(lambda x: pd.Series(x).rolling(self.slow_period).mean(), close)
def next(self):
# 골든 크로스: 매수
if crossover(self.fast_ma, self.slow_ma):
if not self.position:
self.buy()
# 데드 크로스: 매도
elif crossover(self.slow_ma, self.fast_ma):
if self.position:
self.position.close()
# 데이터 준비
data = yf.download("AAPL", start="2020-01-01", end="2025-12-31", progress=False)
data.columns = data.columns.droplevel(1) if isinstance(data.columns, pd.MultiIndex) else data.columns
# 백테스트 실행
bt = Backtest(
data,
MovingAverageCrossover,
cash=100000,
commission=0.001, # 0.1% 수수료
exclusive_orders=True,
)
results = bt.run()
print("=== Moving Average Crossover Results ===")
print(f"총 수익률: {results['Return [%]']:.2f}%")
print(f"연간 수익률: {results['Return (Ann.) [%]']:.2f}%")
print(f"샤프 비율: {results['Sharpe Ratio']:.2f}")
print(f"최대 낙폭: {results['Max. Drawdown [%]']:.2f}%")
print(f"승률: {results['Win Rate [%]']:.2f}%")
print(f"총 거래 횟수: {results['# Trades']}")
# 파라미터 최적화
optimization_results = bt.optimize(
fast_period=range(5, 25, 5),
slow_period=range(20, 60, 10),
maximize="Sharpe Ratio",
constraint=lambda p: p.fast_period < p.slow_period,
)
print(f"\n최적 파라미터: fast={optimization_results._strategy.fast_period}, slow={optimization_results._strategy.slow_period}")
전략 2: RSI 평균회귀 (RSI Mean Reversion)
class RSIMeanReversion(Strategy):
"""RSI 평균회귀 전략
- RSI가 과매도 구간(30 이하)에 진입하면 매수
- RSI가 과매수 구간(70 이상)에 진입하면 매도
"""
rsi_period = 14
rsi_oversold = 30
rsi_overbought = 70
def init(self):
close = pd.Series(self.data.Close)
delta = close.diff()
gain = delta.where(delta > 0, 0.0)
loss = (-delta).where(delta < 0, 0.0)
avg_gain = gain.rolling(window=self.rsi_period).mean()
avg_loss = loss.rolling(window=self.rsi_period).mean()
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))
self.rsi = self.I(lambda: rsi, name="RSI")
def next(self):
if self.rsi[-1] < self.rsi_oversold:
if not self.position:
self.buy()
elif self.rsi[-1] > self.rsi_overbought:
if self.position:
self.position.close()
# 백테스트 실행
bt_rsi = Backtest(
data,
RSIMeanReversion,
cash=100000,
commission=0.001,
exclusive_orders=True,
)
results_rsi = bt_rsi.run()
print("=== RSI Mean Reversion Results ===")
print(f"총 수익률: {results_rsi['Return [%]']:.2f}%")
print(f"샤프 비율: {results_rsi['Sharpe Ratio']:.2f}")
print(f"최대 낙폭: {results_rsi['Max. Drawdown [%]']:.2f}%")
print(f"승률: {results_rsi['Win Rate [%]']:.2f}%")
전략 3: 볼린저 밴드 브레이크아웃 (Bollinger Bands Breakout)
class BollingerBandsBreakout(Strategy):
"""볼린저 밴드 브레이크아웃 전략
- 가격이 하단 밴드를 터치하면 매수 (평균 회귀 기대)
- 가격이 상단 밴드를 터치하면 매도
- 스톱로스: 진입가 대비 2% 하락 시 손절
"""
bb_period = 20
bb_std = 2.0
stop_loss_pct = 0.02
def init(self):
close = pd.Series(self.data.Close)
self.sma = self.I(lambda: close.rolling(self.bb_period).mean(), name="SMA")
std = close.rolling(self.bb_period).std()
self.upper = self.I(lambda: close.rolling(self.bb_period).mean() + self.bb_std * std, name="Upper")
self.lower = self.I(lambda: close.rolling(self.bb_period).mean() - self.bb_std * std, name="Lower")
def next(self):
price = self.data.Close[-1]
# 하단 밴드 터치: 매수
if price <= self.lower[-1]:
if not self.position:
self.buy(sl=price * (1 - self.stop_loss_pct))
# 상단 밴드 터치: 매도
elif price >= self.upper[-1]:
if self.position:
self.position.close()
# 백테스트 실행
bt_bb = Backtest(
data,
BollingerBandsBreakout,
cash=100000,
commission=0.001,
exclusive_orders=True,
)
results_bb = bt_bb.run()
print("=== Bollinger Bands Breakout Results ===")
print(f"총 수익률: {results_bb['Return [%]']:.2f}%")
print(f"샤프 비율: {results_bb['Sharpe Ratio']:.2f}")
print(f"최대 낙폭: {results_bb['Max. Drawdown [%]']:.2f}%")
print(f"승률: {results_bb['Win Rate [%]']:.2f}%")
리스크 관리 메트릭
핵심 성과 지표 계산
import numpy as np
import pandas as pd
from scipy import stats
class RiskMetrics:
"""리스크 관리 메트릭 계산기"""
def __init__(self, returns: pd.Series, risk_free_rate: float = 0.04):
"""
Args:
returns: 일별 수익률 시리즈
risk_free_rate: 무위험 수익률 (연율, 기본 4%)
"""
self.returns = returns.dropna()
self.risk_free_rate = risk_free_rate
self.daily_rf = (1 + risk_free_rate) ** (1/252) - 1
def sharpe_ratio(self) -> float:
"""샤프 비율 계산"""
excess_returns = self.returns - self.daily_rf
if excess_returns.std() == 0:
return 0.0
return np.sqrt(252) * excess_returns.mean() / excess_returns.std()
def sortino_ratio(self) -> float:
"""소르티노 비율 계산 (하방 위험만 고려)"""
excess_returns = self.returns - self.daily_rf
downside_returns = excess_returns[excess_returns < 0]
if len(downside_returns) == 0 or downside_returns.std() == 0:
return 0.0
downside_std = downside_returns.std()
return np.sqrt(252) * excess_returns.mean() / downside_std
def maximum_drawdown(self) -> float:
"""최대 낙폭 (MDD) 계산"""
cumulative = (1 + self.returns).cumprod()
peak = cumulative.expanding().max()
drawdown = (cumulative - peak) / peak
return drawdown.min()
def value_at_risk(self, confidence: float = 0.95) -> float:
"""VaR (Value at Risk) 계산 - 히스토리컬 방법"""
return np.percentile(self.returns, (1 - confidence) * 100)
def conditional_var(self, confidence: float = 0.95) -> float:
"""CVaR (Conditional VaR) 계산"""
var = self.value_at_risk(confidence)
return self.returns[self.returns <= var].mean()
def calmar_ratio(self) -> float:
"""칼마 비율 계산 (연간 수익률 / MDD)"""
annual_return = (1 + self.returns.mean()) ** 252 - 1
mdd = abs(self.maximum_drawdown())
if mdd == 0:
return 0.0
return annual_return / mdd
def summary(self) -> dict:
"""전체 리스크 메트릭 요약"""
annual_return = (1 + self.returns.mean()) ** 252 - 1
annual_volatility = self.returns.std() * np.sqrt(252)
return {
"연간 수익률": f"{annual_return:.2%}",
"연간 변동성": f"{annual_volatility:.2%}",
"샤프 비율": f"{self.sharpe_ratio():.2f}",
"소르티노 비율": f"{self.sortino_ratio():.2f}",
"최대 낙폭(MDD)": f"{self.maximum_drawdown():.2%}",
"VaR(95%)": f"{self.value_at_risk():.2%}",
"CVaR(95%)": f"{self.conditional_var():.2%}",
"칼마 비율": f"{self.calmar_ratio():.2f}",
"총 거래일": len(self.returns),
"양의 수익일": f"{(self.returns > 0).sum()} ({(self.returns > 0).mean():.1%})",
}
# 사용 예시
# SPY의 일별 수익률 계산
spy = yf.download("SPY", start="2020-01-01", end="2025-12-31", progress=False)
daily_returns = spy["Close"].pct_change().dropna()
metrics = RiskMetrics(daily_returns.squeeze(), risk_free_rate=0.04)
summary = metrics.summary()
print("=== SPY Risk Metrics ===")
for key, value in summary.items():
print(f" {key}: {value}")
포지션 사이징
Kelly Criterion (켈리 기준)
class PositionSizer:
"""포지션 사이징 알고리즘"""
@staticmethod
def kelly_criterion(win_rate: float, avg_win: float, avg_loss: float) -> float:
"""켈리 기준에 의한 최적 포지션 크기 계산
Args:
win_rate: 승률 (0~1)
avg_win: 평균 이익률 (양수)
avg_loss: 평균 손실률 (양수)
Returns:
최적 베팅 비율 (0~1)
"""
if avg_loss == 0:
return 0.0
# Kelly Formula: f = (bp - q) / b
# b = avg_win / avg_loss (odds ratio)
# p = win_rate, q = 1 - win_rate
b = avg_win / avg_loss
p = win_rate
q = 1 - p
kelly = (b * p - q) / b
# 음수면 베팅하지 않음
return max(0.0, kelly)
@staticmethod
def half_kelly(win_rate: float, avg_win: float, avg_loss: float) -> float:
"""하프 켈리: 켈리 기준의 절반으로 보수적 접근"""
full_kelly = PositionSizer.kelly_criterion(win_rate, avg_win, avg_loss)
return full_kelly / 2
@staticmethod
def fixed_fractional(equity: float, risk_per_trade: float,
entry_price: float, stop_loss_price: float) -> int:
"""고정 비율(Fixed Fractional) 포지션 사이징
Args:
equity: 현재 자본금
risk_per_trade: 거래당 위험 비율 (예: 0.02 = 2%)
entry_price: 진입 가격
stop_loss_price: 손절 가격
Returns:
매수할 주식 수
"""
risk_amount = equity * risk_per_trade
risk_per_share = abs(entry_price - stop_loss_price)
if risk_per_share == 0:
return 0
shares = int(risk_amount / risk_per_share)
return max(0, shares)
@staticmethod
def volatility_based(equity: float, target_volatility: float,
asset_volatility: float) -> float:
"""변동성 기반 포지션 사이징
Args:
equity: 현재 자본금
target_volatility: 목표 포트폴리오 변동성 (연율)
asset_volatility: 자산 변동성 (연율)
Returns:
포지션 비중 (0~1)
"""
if asset_volatility == 0:
return 0.0
weight = target_volatility / asset_volatility
return min(weight, 1.0) # 최대 100%
# 사용 예시
sizer = PositionSizer()
# 켈리 기준 계산
win_rate = 0.55
avg_win = 0.03 # 평균 3% 이익
avg_loss = 0.02 # 평균 2% 손실
kelly = sizer.kelly_criterion(win_rate, avg_win, avg_loss)
half = sizer.half_kelly(win_rate, avg_win, avg_loss)
print(f"켈리 기준: {kelly:.2%}")
print(f"하프 켈리: {half:.2%}")
# 고정 비율 포지션 사이징
equity = 100000
entry = 150.0
stop_loss = 147.0
shares = sizer.fixed_fractional(equity, 0.02, entry, stop_loss)
print(f"매수 주식 수: {shares}주 (진입: {entry}, 손절: {stop_loss})")
워크포워드 최적화
Walk-Forward Analysis 구현
import pandas as pd
import numpy as np
from backtesting import Backtest
class WalkForwardOptimizer:
"""워크포워드 최적화"""
def __init__(self, data: pd.DataFrame, strategy_class,
train_period: int = 252, test_period: int = 63):
"""
Args:
data: OHLCV 데이터
strategy_class: 전략 클래스
train_period: 학습 기간 (거래일, 기본 1년)
test_period: 테스트 기간 (거래일, 기본 3개월)
"""
self.data = data
self.strategy_class = strategy_class
self.train_period = train_period
self.test_period = test_period
def run(self, optimization_params: dict, maximize: str = "Sharpe Ratio") -> list:
"""워크포워드 분석 실행"""
results = []
total_days = len(self.data)
start_idx = 0
fold = 1
while start_idx + self.train_period + self.test_period <= total_days:
train_end = start_idx + self.train_period
test_end = train_end + self.test_period
train_data = self.data.iloc[start_idx:train_end]
test_data = self.data.iloc[train_end:test_end]
# 학습 기간에서 파라미터 최적화
bt_train = Backtest(
train_data, self.strategy_class,
cash=100000, commission=0.001,
)
opt_result = bt_train.optimize(
**optimization_params,
maximize=maximize,
)
# 최적화된 파라미터 추출
best_params = {}
for param_name in optimization_params:
best_params[param_name] = getattr(opt_result._strategy, param_name)
# 테스트 기간에서 검증
bt_test = Backtest(
test_data, self.strategy_class,
cash=100000, commission=0.001,
)
# 최적화된 파라미터로 테스트 실행
test_result = bt_test.run(**best_params)
fold_result = {
"fold": fold,
"train_start": train_data.index[0],
"train_end": train_data.index[-1],
"test_start": test_data.index[0],
"test_end": test_data.index[-1],
"best_params": best_params,
"train_return": opt_result["Return [%]"],
"test_return": test_result["Return [%]"],
"test_sharpe": test_result["Sharpe Ratio"],
"test_mdd": test_result["Max. Drawdown [%]"],
}
results.append(fold_result)
print(f"Fold {fold}: Train Return={fold_result['train_return']:.2f}%, "
f"Test Return={fold_result['test_return']:.2f}%, "
f"Params={best_params}")
start_idx += self.test_period
fold += 1
return results
def summary(self, results: list) -> dict:
"""워크포워드 결과 요약"""
test_returns = [r["test_return"] for r in results]
test_sharpes = [r["test_sharpe"] for r in results]
return {
"총 Fold 수": len(results),
"평균 테스트 수익률": f"{np.mean(test_returns):.2f}%",
"테스트 수익률 표준편차": f"{np.std(test_returns):.2f}%",
"양의 수익률 Fold 비율": f"{sum(1 for r in test_returns if r > 0) / len(test_returns):.1%}",
"평균 테스트 샤프": f"{np.mean(test_sharpes):.2f}",
}
# 사용 예시
# wfo = WalkForwardOptimizer(data, MovingAverageCrossover)
# results = wfo.run(
# optimization_params={
# "fast_period": range(5, 25, 5),
# "slow_period": range(20, 60, 10),
# },
# )
# print(wfo.summary(results))
트러블슈팅: 일반적인 함정
과적합 (Overfitting)
과거 데이터에 지나치게 최적화된 전략은 실전에서 성능이 크게 저하된다.
class OverfitDetector:
"""과적합 감지기"""
@staticmethod
def check_overfit(train_sharpe: float, test_sharpe: float,
threshold: float = 0.5) -> dict:
"""과적합 여부 판단
Args:
train_sharpe: 학습 기간 샤프 비율
test_sharpe: 테스트 기간 샤프 비율
threshold: 허용 성능 감소 비율
Returns:
과적합 진단 결과
"""
if train_sharpe <= 0:
return {"is_overfit": True, "reason": "학습 기간 성과 자체가 음수"}
degradation = 1 - (test_sharpe / train_sharpe)
is_overfit = degradation > threshold
return {
"is_overfit": is_overfit,
"train_sharpe": train_sharpe,
"test_sharpe": test_sharpe,
"performance_degradation": f"{degradation:.1%}",
"recommendation": (
"과적합 의심: 파라미터 수를 줄이거나 학습 기간을 늘리세요"
if is_overfit
else "적절한 범위 내의 성능 차이"
),
}
@staticmethod
def parameter_sensitivity(results_grid: dict) -> dict:
"""파라미터 민감도 분석
최적 파라미터 주변에서 성능이 급격히 떨어지면 과적합 가능성 높음
"""
sharpe_values = list(results_grid.values())
mean_sharpe = np.mean(sharpe_values)
std_sharpe = np.std(sharpe_values)
max_sharpe = max(sharpe_values)
# 최적 성능이 평균보다 2 표준편차 이상 높으면 과적합 의심
is_sensitive = (max_sharpe - mean_sharpe) > 2 * std_sharpe
return {
"is_sensitive": is_sensitive,
"max_sharpe": max_sharpe,
"mean_sharpe": mean_sharpe,
"std_sharpe": std_sharpe,
"recommendation": (
"파라미터 민감도 높음: 과적합 위험"
if is_sensitive
else "파라미터에 대해 안정적인 성과"
),
}
생존 편향 (Survivorship Bias) 방지
def check_survivorship_bias(tickers: list, start_date: str) -> dict:
"""생존 편향 검사
현재 존재하는 종목만으로 백테스트하면 생존 편향이 발생
권장: 상장폐지/합병 종목도 포함된 데이터셋 사용
"""
warnings = []
# 현재 시점의 인덱스 구성 종목으로만 테스트하는 경우 경고
if all(yf.Ticker(t).info.get("marketCap", 0) > 0 for t in tickers[:5]):
warnings.append(
"현재 상장 중인 종목만 포함됨. "
"과거에 상장폐지되거나 합병된 종목이 누락되어 "
"수익률이 과대평가될 수 있음"
)
return {
"ticker_count": len(tickers),
"warnings": warnings,
"recommendation": "Point-in-time 데이터셋 사용 권장 (예: CRSP, Sharadar)",
}
미래 참조 편향 (Look-Ahead Bias) 방지
def validate_no_lookahead(strategy_code: str) -> list:
"""미래 참조 편향 검사 (코드 정적 분석)"""
warnings = []
# 미래 데이터를 참조하는 패턴 검사
dangerous_patterns = [
("shift(-", "미래 데이터를 참조하는 shift(-N) 감지"),
(".iloc[-1]", "마지막 행 참조 - 문맥에 따라 미래 참조 가능"),
("resample", "리샘플링 시 미래 데이터 포함 가능"),
]
for pattern, description in dangerous_patterns:
if pattern in strategy_code:
warnings.append(f"경고: {description} - '{pattern}' 발견")
if not warnings:
warnings.append("명시적인 미래 참조 패턴은 감지되지 않음")
return warnings
실전 트레이딩 고려사항
슬리피지와 거래 비용
백테스팅에서는 이상적인 가격으로 체결되지만, 실전에서는 슬리피지(Slippage)와 거래 비용이 발생한다.
class RealisticBacktestConfig:
"""실전에 가까운 백테스트 설정"""
@staticmethod
def get_config(asset_type: str = "us_equity") -> dict:
"""자산 유형별 현실적인 거래 비용 설정"""
configs = {
"us_equity": {
"commission": 0.001, # 0.1% 수수료
"slippage": 0.0005, # 0.05% 슬리피지
"spread": 0.0001, # 0.01% 스프레드 (대형주)
"market_impact": 0.0002, # 0.02% 시장 충격
},
"kr_equity": {
"commission": 0.00015, # 0.015% (증권사 수수료)
"tax": 0.0018, # 0.18% (증권거래세, 2026년 기준)
"slippage": 0.001, # 0.1% 슬리피지
"spread": 0.0005, # 0.05% 스프레드
},
"crypto": {
"commission": 0.001, # 0.1% (메이커 수수료)
"slippage": 0.002, # 0.2% 슬리피지
"spread": 0.001, # 0.1% 스프레드
},
}
return configs.get(asset_type, configs["us_equity"])
@staticmethod
def total_cost_per_trade(config: dict) -> float:
"""거래당 총 비용 계산"""
return sum(config.values())
# 비용 확인
for asset_type in ["us_equity", "kr_equity", "crypto"]:
config = RealisticBacktestConfig.get_config(asset_type)
total = RealisticBacktestConfig.total_cost_per_trade(config)
print(f"{asset_type}: 거래당 총 비용 약 {total:.3%}")
운영 노트
라이브 트레이딩 전 체크리스트
- 페이퍼 트레이딩: 최소 3개월간 모의 거래로 전략 검증
- 소액 시작: 전체 자본의 5~10%로 시작하여 점진적으로 확대
- 모니터링 시스템: 실시간 포지션, 손익, 리스크 메트릭 대시보드 구축
- 비상 정지(Kill Switch): 일일 손실 한도 초과 시 자동 매매 중지 로직
- 로그 기록: 모든 주문, 체결, 오류를 상세히 로깅
심리적 요인 관리
- 알고리즘이 손실을 기록해도 전략을 수동으로 오버라이드하지 않는다
- 백테스팅 결과와 실전 결과의 괴리를 예상하고 수용한다
- 최대 낙폭(MDD) 시나리오를 미리 경험해 본다 (시뮬레이션)
- 전략별 최대 운용 기간과 폐기 기준을 사전에 설정한다
프로덕션 체크리스트
- [ ] 최소 5년 이상의 과거 데이터로 백테스팅 완료
- [ ] 워크포워드 최적화로 과적합 검증 통과
- [ ] 생존 편향 및 미래 참조 편향 점검 완료
- [ ] 현실적인 거래 비용(수수료, 슬리피지, 세금) 적용
- [ ] 포지션 사이징 알고리즘 적용 (켈리 기준 또는 고정 비율)
- [ ] 손절/익절 로직 구현 및 테스트
- [ ] 3개월 이상 페이퍼 트레이딩 완료
- [ ] 비상 정지(Kill Switch) 로직 구현
- [ ] 실시간 모니터링 대시보드 구축
- [ ] 거래 로그 및 성과 리포트 자동 생성
- [ ] 네트워크 장애 및 API 오류 대응 로직 구현
- [ ] 세금 및 규제 요건 확인 완료
참고자료
Python Algorithmic Trading Practical Guide: Backtesting Frameworks, Strategy Development, and Risk Management
- Introduction
- Algorithmic Trading Overview
- Backtesting Framework Comparison
- Data Collection
- Strategy Implementation
- Risk Management Metrics
- Position Sizing
- Walk-Forward Optimization
- Troubleshooting: Common Pitfalls
- Live Trading Considerations
- Operational Notes
- Production Checklist
- References

Introduction
Algorithmic Trading is a systematic investment approach that automatically executes trades based on predefined rules. While it offers the advantage of executing consistent strategies without emotional interference, poorly designed strategies can cause significant losses in live markets.
Python has established itself as the primary language for quantitative trading thanks to its rich financial library ecosystem, data analysis tools, and easy prototyping. This guide covers the entire algorithmic trading pipeline from selecting backtesting frameworks to strategy development, risk management, and live trading considerations.
Disclaimer: This article is written for educational purposes only and does not constitute investment advice. Actual trading involves significant risk.
Algorithmic Trading Overview
Systematic vs Discretionary Trading
| Category | Systematic | Discretionary |
|---|---|---|
| Decision Making | Algorithm/rule-based | Human judgment |
| Emotional Impact | None | High |
| Speed | Millisecond execution possible | Seconds to minutes |
| Scalability | Can manage thousands of instruments | Limited |
| Backtesting | Systematic verification possible | Subjective evaluation |
| Adaptability | Requires rule changes | Flexible response |
| Development Cost | High initial investment | Low |
Algorithmic Trading Pipeline
- Data Collection: Acquire market data (prices, volumes, financial data)
- Strategy Development: Design trade signal logic
- Backtesting: Validate strategy with historical data
- Optimization: Parameter tuning and walk-forward validation
- Risk Management: Position sizing, stop-loss/take-profit setup
- Live Deployment: Paper trading followed by live capital deployment
Backtesting Framework Comparison
| Framework | Speed | Ease of Use | Feature Range | Live Trading | Community |
|---|---|---|---|---|---|
| Backtesting.py | Fast | Very easy | Basic | Not supported | Moderate |
| Zipline | Moderate | Moderate | Extensive | Limited | Active |
| vectorbt | Very fast | Moderate | Advanced analytics | Not supported | Active |
| Backtrader | Moderate | Moderate | Very extensive | Supported (IB) | Active |
| QuantConnect | Fast | Easy | Very extensive | Full support | Very active |
Framework Selection Guide
- Rapid prototyping: Backtesting.py (validate strategies in a few lines of code)
- Large-scale vector operations: vectorbt (NumPy-based high-speed processing)
- Live trading integration: Backtrader (Interactive Brokers integration)
- Cloud-based integrated environment: QuantConnect (data + execution integrated)
Data Collection
Market Data Collection with yfinance
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
def fetch_market_data(
tickers: list,
start_date: str = "2020-01-01",
end_date: str = None,
interval: str = "1d",
) -> dict:
"""Collect market data"""
if end_date is None:
end_date = datetime.now().strftime("%Y-%m-%d")
data = {}
for ticker in tickers:
try:
df = yf.download(
ticker,
start=start_date,
end=end_date,
interval=interval,
progress=False,
)
if not df.empty:
data[ticker] = df
print(f"{ticker}: {len(df)} rows loaded ({df.index[0]} ~ {df.index[-1]})")
else:
print(f"{ticker}: No data available")
except Exception as e:
print(f"{ticker}: Error - {e}")
return data
# Collect data
tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "SPY"]
market_data = fetch_market_data(tickers, start_date="2020-01-01")
# Verify data
aapl = market_data["AAPL"]
print(f"\nAAPL Data Summary:")
print(f" Period: {aapl.index[0]} ~ {aapl.index[-1]}")
print(f" Data points: {len(aapl)}")
print(f" Columns: {list(aapl.columns)}")
Using the Alpha Vantage API
import requests
import pandas as pd
class AlphaVantageClient:
"""Alpha Vantage API client"""
BASE_URL = "https://www.alphavantage.co/query"
def __init__(self, api_key: str):
self.api_key = api_key
def get_daily(self, symbol: str, outputsize: str = "full") -> pd.DataFrame:
"""Fetch daily OHLCV data"""
params = {
"function": "TIME_SERIES_DAILY_ADJUSTED",
"symbol": symbol,
"outputsize": outputsize,
"apikey": self.api_key,
}
response = requests.get(self.BASE_URL, params=params)
data = response.json()
if "Time Series (Daily)" not in data:
raise ValueError(f"API error: {data.get('Note', data.get('Error Message', 'Unknown'))}")
df = pd.DataFrame.from_dict(data["Time Series (Daily)"], orient="index")
df.columns = ["Open", "High", "Low", "Close", "Adj Close", "Volume", "Dividend", "Split"]
df = df.astype(float)
df.index = pd.to_datetime(df.index)
df = df.sort_index()
return df
def get_intraday(self, symbol: str, interval: str = "5min") -> pd.DataFrame:
"""Fetch intraday data"""
params = {
"function": "TIME_SERIES_INTRADAY",
"symbol": symbol,
"interval": interval,
"outputsize": "full",
"apikey": self.api_key,
}
response = requests.get(self.BASE_URL, params=params)
data = response.json()
time_series_key = f"Time Series ({interval})"
if time_series_key not in data:
raise ValueError(f"API error: {data}")
df = pd.DataFrame.from_dict(data[time_series_key], orient="index")
df.columns = ["Open", "High", "Low", "Close", "Volume"]
df = df.astype(float)
df.index = pd.to_datetime(df.index)
df = df.sort_index()
return df
# Usage example
# client = AlphaVantageClient(api_key="YOUR_API_KEY")
# daily_data = client.get_daily("AAPL")
Strategy Implementation
Strategy 1: Moving Average Crossover
import pandas as pd
import numpy as np
from backtesting import Backtest, Strategy
from backtesting.lib import crossover
class MovingAverageCrossover(Strategy):
"""Moving Average Crossover Strategy
- Buy when the fast MA crosses above the slow MA
- Sell when the fast MA crosses below the slow MA
"""
fast_period = 10 # Fast MA period
slow_period = 30 # Slow MA period
def init(self):
close = self.data.Close
self.fast_ma = self.I(lambda x: pd.Series(x).rolling(self.fast_period).mean(), close)
self.slow_ma = self.I(lambda x: pd.Series(x).rolling(self.slow_period).mean(), close)
def next(self):
# Golden Cross: Buy
if crossover(self.fast_ma, self.slow_ma):
if not self.position:
self.buy()
# Death Cross: Sell
elif crossover(self.slow_ma, self.fast_ma):
if self.position:
self.position.close()
# Prepare data
data = yf.download("AAPL", start="2020-01-01", end="2025-12-31", progress=False)
data.columns = data.columns.droplevel(1) if isinstance(data.columns, pd.MultiIndex) else data.columns
# Run backtest
bt = Backtest(
data,
MovingAverageCrossover,
cash=100000,
commission=0.001, # 0.1% commission
exclusive_orders=True,
)
results = bt.run()
print("=== Moving Average Crossover Results ===")
print(f"Total Return: {results['Return [%]']:.2f}%")
print(f"Annual Return: {results['Return (Ann.) [%]']:.2f}%")
print(f"Sharpe Ratio: {results['Sharpe Ratio']:.2f}")
print(f"Max Drawdown: {results['Max. Drawdown [%]']:.2f}%")
print(f"Win Rate: {results['Win Rate [%]']:.2f}%")
print(f"Total Trades: {results['# Trades']}")
# Parameter optimization
optimization_results = bt.optimize(
fast_period=range(5, 25, 5),
slow_period=range(20, 60, 10),
maximize="Sharpe Ratio",
constraint=lambda p: p.fast_period < p.slow_period,
)
print(f"\nOptimal params: fast={optimization_results._strategy.fast_period}, slow={optimization_results._strategy.slow_period}")
Strategy 2: RSI Mean Reversion
class RSIMeanReversion(Strategy):
"""RSI Mean Reversion Strategy
- Buy when RSI enters oversold zone (below 30)
- Sell when RSI enters overbought zone (above 70)
"""
rsi_period = 14
rsi_oversold = 30
rsi_overbought = 70
def init(self):
close = pd.Series(self.data.Close)
delta = close.diff()
gain = delta.where(delta > 0, 0.0)
loss = (-delta).where(delta < 0, 0.0)
avg_gain = gain.rolling(window=self.rsi_period).mean()
avg_loss = loss.rolling(window=self.rsi_period).mean()
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))
self.rsi = self.I(lambda: rsi, name="RSI")
def next(self):
if self.rsi[-1] < self.rsi_oversold:
if not self.position:
self.buy()
elif self.rsi[-1] > self.rsi_overbought:
if self.position:
self.position.close()
# Run backtest
bt_rsi = Backtest(
data,
RSIMeanReversion,
cash=100000,
commission=0.001,
exclusive_orders=True,
)
results_rsi = bt_rsi.run()
print("=== RSI Mean Reversion Results ===")
print(f"Total Return: {results_rsi['Return [%]']:.2f}%")
print(f"Sharpe Ratio: {results_rsi['Sharpe Ratio']:.2f}")
print(f"Max Drawdown: {results_rsi['Max. Drawdown [%]']:.2f}%")
print(f"Win Rate: {results_rsi['Win Rate [%]']:.2f}%")
Strategy 3: Bollinger Bands Breakout
class BollingerBandsBreakout(Strategy):
"""Bollinger Bands Breakout Strategy
- Buy when price touches the lower band (expecting mean reversion)
- Sell when price touches the upper band
- Stop-loss: 2% below entry price
"""
bb_period = 20
bb_std = 2.0
stop_loss_pct = 0.02
def init(self):
close = pd.Series(self.data.Close)
self.sma = self.I(lambda: close.rolling(self.bb_period).mean(), name="SMA")
std = close.rolling(self.bb_period).std()
self.upper = self.I(lambda: close.rolling(self.bb_period).mean() + self.bb_std * std, name="Upper")
self.lower = self.I(lambda: close.rolling(self.bb_period).mean() - self.bb_std * std, name="Lower")
def next(self):
price = self.data.Close[-1]
# Touch lower band: Buy
if price <= self.lower[-1]:
if not self.position:
self.buy(sl=price * (1 - self.stop_loss_pct))
# Touch upper band: Sell
elif price >= self.upper[-1]:
if self.position:
self.position.close()
# Run backtest
bt_bb = Backtest(
data,
BollingerBandsBreakout,
cash=100000,
commission=0.001,
exclusive_orders=True,
)
results_bb = bt_bb.run()
print("=== Bollinger Bands Breakout Results ===")
print(f"Total Return: {results_bb['Return [%]']:.2f}%")
print(f"Sharpe Ratio: {results_bb['Sharpe Ratio']:.2f}")
print(f"Max Drawdown: {results_bb['Max. Drawdown [%]']:.2f}%")
print(f"Win Rate: {results_bb['Win Rate [%]']:.2f}%")
Risk Management Metrics
Core Performance Metrics Calculation
import numpy as np
import pandas as pd
from scipy import stats
class RiskMetrics:
"""Risk management metrics calculator"""
def __init__(self, returns: pd.Series, risk_free_rate: float = 0.04):
"""
Args:
returns: Daily returns series
risk_free_rate: Risk-free rate (annualized, default 4%)
"""
self.returns = returns.dropna()
self.risk_free_rate = risk_free_rate
self.daily_rf = (1 + risk_free_rate) ** (1/252) - 1
def sharpe_ratio(self) -> float:
"""Calculate Sharpe Ratio"""
excess_returns = self.returns - self.daily_rf
if excess_returns.std() == 0:
return 0.0
return np.sqrt(252) * excess_returns.mean() / excess_returns.std()
def sortino_ratio(self) -> float:
"""Calculate Sortino Ratio (considers only downside risk)"""
excess_returns = self.returns - self.daily_rf
downside_returns = excess_returns[excess_returns < 0]
if len(downside_returns) == 0 or downside_returns.std() == 0:
return 0.0
downside_std = downside_returns.std()
return np.sqrt(252) * excess_returns.mean() / downside_std
def maximum_drawdown(self) -> float:
"""Calculate Maximum Drawdown (MDD)"""
cumulative = (1 + self.returns).cumprod()
peak = cumulative.expanding().max()
drawdown = (cumulative - peak) / peak
return drawdown.min()
def value_at_risk(self, confidence: float = 0.95) -> float:
"""Calculate VaR (Value at Risk) - Historical method"""
return np.percentile(self.returns, (1 - confidence) * 100)
def conditional_var(self, confidence: float = 0.95) -> float:
"""Calculate CVaR (Conditional VaR)"""
var = self.value_at_risk(confidence)
return self.returns[self.returns <= var].mean()
def calmar_ratio(self) -> float:
"""Calculate Calmar Ratio (Annual Return / MDD)"""
annual_return = (1 + self.returns.mean()) ** 252 - 1
mdd = abs(self.maximum_drawdown())
if mdd == 0:
return 0.0
return annual_return / mdd
def summary(self) -> dict:
"""Full risk metrics summary"""
annual_return = (1 + self.returns.mean()) ** 252 - 1
annual_volatility = self.returns.std() * np.sqrt(252)
return {
"Annual Return": f"{annual_return:.2%}",
"Annual Volatility": f"{annual_volatility:.2%}",
"Sharpe Ratio": f"{self.sharpe_ratio():.2f}",
"Sortino Ratio": f"{self.sortino_ratio():.2f}",
"Maximum Drawdown": f"{self.maximum_drawdown():.2%}",
"VaR (95%)": f"{self.value_at_risk():.2%}",
"CVaR (95%)": f"{self.conditional_var():.2%}",
"Calmar Ratio": f"{self.calmar_ratio():.2f}",
"Total Trading Days": len(self.returns),
"Positive Return Days": f"{(self.returns > 0).sum()} ({(self.returns > 0).mean():.1%})",
}
# Usage example
# Calculate daily returns for SPY
spy = yf.download("SPY", start="2020-01-01", end="2025-12-31", progress=False)
daily_returns = spy["Close"].pct_change().dropna()
metrics = RiskMetrics(daily_returns.squeeze(), risk_free_rate=0.04)
summary = metrics.summary()
print("=== SPY Risk Metrics ===")
for key, value in summary.items():
print(f" {key}: {value}")
Position Sizing
Kelly Criterion
class PositionSizer:
"""Position sizing algorithms"""
@staticmethod
def kelly_criterion(win_rate: float, avg_win: float, avg_loss: float) -> float:
"""Calculate optimal position size using Kelly Criterion
Args:
win_rate: Win rate (0-1)
avg_win: Average gain rate (positive)
avg_loss: Average loss rate (positive)
Returns:
Optimal betting fraction (0-1)
"""
if avg_loss == 0:
return 0.0
# Kelly Formula: f = (bp - q) / b
# b = avg_win / avg_loss (odds ratio)
# p = win_rate, q = 1 - win_rate
b = avg_win / avg_loss
p = win_rate
q = 1 - p
kelly = (b * p - q) / b
# No bet if negative
return max(0.0, kelly)
@staticmethod
def half_kelly(win_rate: float, avg_win: float, avg_loss: float) -> float:
"""Half Kelly: Conservative approach using half the Kelly fraction"""
full_kelly = PositionSizer.kelly_criterion(win_rate, avg_win, avg_loss)
return full_kelly / 2
@staticmethod
def fixed_fractional(equity: float, risk_per_trade: float,
entry_price: float, stop_loss_price: float) -> int:
"""Fixed Fractional position sizing
Args:
equity: Current capital
risk_per_trade: Risk per trade (e.g., 0.02 = 2%)
entry_price: Entry price
stop_loss_price: Stop-loss price
Returns:
Number of shares to buy
"""
risk_amount = equity * risk_per_trade
risk_per_share = abs(entry_price - stop_loss_price)
if risk_per_share == 0:
return 0
shares = int(risk_amount / risk_per_share)
return max(0, shares)
@staticmethod
def volatility_based(equity: float, target_volatility: float,
asset_volatility: float) -> float:
"""Volatility-based position sizing
Args:
equity: Current capital
target_volatility: Target portfolio volatility (annualized)
asset_volatility: Asset volatility (annualized)
Returns:
Position weight (0-1)
"""
if asset_volatility == 0:
return 0.0
weight = target_volatility / asset_volatility
return min(weight, 1.0) # Max 100%
# Usage example
sizer = PositionSizer()
# Kelly Criterion calculation
win_rate = 0.55
avg_win = 0.03 # Average 3% gain
avg_loss = 0.02 # Average 2% loss
kelly = sizer.kelly_criterion(win_rate, avg_win, avg_loss)
half = sizer.half_kelly(win_rate, avg_win, avg_loss)
print(f"Kelly Criterion: {kelly:.2%}")
print(f"Half Kelly: {half:.2%}")
# Fixed fractional position sizing
equity = 100000
entry = 150.0
stop_loss = 147.0
shares = sizer.fixed_fractional(equity, 0.02, entry, stop_loss)
print(f"Shares to buy: {shares} shares (entry: {entry}, stop-loss: {stop_loss})")
Walk-Forward Optimization
Walk-Forward Analysis Implementation
import pandas as pd
import numpy as np
from backtesting import Backtest
class WalkForwardOptimizer:
"""Walk-Forward Optimizer"""
def __init__(self, data: pd.DataFrame, strategy_class,
train_period: int = 252, test_period: int = 63):
"""
Args:
data: OHLCV data
strategy_class: Strategy class
train_period: Training period (trading days, default 1 year)
test_period: Testing period (trading days, default 3 months)
"""
self.data = data
self.strategy_class = strategy_class
self.train_period = train_period
self.test_period = test_period
def run(self, optimization_params: dict, maximize: str = "Sharpe Ratio") -> list:
"""Execute walk-forward analysis"""
results = []
total_days = len(self.data)
start_idx = 0
fold = 1
while start_idx + self.train_period + self.test_period <= total_days:
train_end = start_idx + self.train_period
test_end = train_end + self.test_period
train_data = self.data.iloc[start_idx:train_end]
test_data = self.data.iloc[train_end:test_end]
# Optimize parameters on training period
bt_train = Backtest(
train_data, self.strategy_class,
cash=100000, commission=0.001,
)
opt_result = bt_train.optimize(
**optimization_params,
maximize=maximize,
)
# Extract optimized parameters
best_params = {}
for param_name in optimization_params:
best_params[param_name] = getattr(opt_result._strategy, param_name)
# Validate on test period
bt_test = Backtest(
test_data, self.strategy_class,
cash=100000, commission=0.001,
)
# Run test with optimized parameters
test_result = bt_test.run(**best_params)
fold_result = {
"fold": fold,
"train_start": train_data.index[0],
"train_end": train_data.index[-1],
"test_start": test_data.index[0],
"test_end": test_data.index[-1],
"best_params": best_params,
"train_return": opt_result["Return [%]"],
"test_return": test_result["Return [%]"],
"test_sharpe": test_result["Sharpe Ratio"],
"test_mdd": test_result["Max. Drawdown [%]"],
}
results.append(fold_result)
print(f"Fold {fold}: Train Return={fold_result['train_return']:.2f}%, "
f"Test Return={fold_result['test_return']:.2f}%, "
f"Params={best_params}")
start_idx += self.test_period
fold += 1
return results
def summary(self, results: list) -> dict:
"""Walk-forward results summary"""
test_returns = [r["test_return"] for r in results]
test_sharpes = [r["test_sharpe"] for r in results]
return {
"Total Folds": len(results),
"Avg Test Return": f"{np.mean(test_returns):.2f}%",
"Test Return Std Dev": f"{np.std(test_returns):.2f}%",
"Positive Return Fold Ratio": f"{sum(1 for r in test_returns if r > 0) / len(test_returns):.1%}",
"Avg Test Sharpe": f"{np.mean(test_sharpes):.2f}",
}
# Usage example
# wfo = WalkForwardOptimizer(data, MovingAverageCrossover)
# results = wfo.run(
# optimization_params={
# "fast_period": range(5, 25, 5),
# "slow_period": range(20, 60, 10),
# },
# )
# print(wfo.summary(results))
Troubleshooting: Common Pitfalls
Overfitting
Strategies overly optimized on historical data often perform poorly in live trading.
class OverfitDetector:
"""Overfit detector"""
@staticmethod
def check_overfit(train_sharpe: float, test_sharpe: float,
threshold: float = 0.5) -> dict:
"""Determine whether overfitting has occurred
Args:
train_sharpe: Training period Sharpe ratio
test_sharpe: Testing period Sharpe ratio
threshold: Allowed performance degradation ratio
Returns:
Overfitting diagnosis results
"""
if train_sharpe <= 0:
return {"is_overfit": True, "reason": "Training period performance is negative"}
degradation = 1 - (test_sharpe / train_sharpe)
is_overfit = degradation > threshold
return {
"is_overfit": is_overfit,
"train_sharpe": train_sharpe,
"test_sharpe": test_sharpe,
"performance_degradation": f"{degradation:.1%}",
"recommendation": (
"Overfitting suspected: Reduce parameters or extend training period"
if is_overfit
else "Performance difference within acceptable range"
),
}
@staticmethod
def parameter_sensitivity(results_grid: dict) -> dict:
"""Parameter sensitivity analysis
If performance drops sharply around optimal parameters, overfitting is likely
"""
sharpe_values = list(results_grid.values())
mean_sharpe = np.mean(sharpe_values)
std_sharpe = np.std(sharpe_values)
max_sharpe = max(sharpe_values)
# Overfitting suspected if optimal is more than 2 std above mean
is_sensitive = (max_sharpe - mean_sharpe) > 2 * std_sharpe
return {
"is_sensitive": is_sensitive,
"max_sharpe": max_sharpe,
"mean_sharpe": mean_sharpe,
"std_sharpe": std_sharpe,
"recommendation": (
"High parameter sensitivity: overfitting risk"
if is_sensitive
else "Stable performance across parameters"
),
}
Survivorship Bias Prevention
def check_survivorship_bias(tickers: list, start_date: str) -> dict:
"""Survivorship bias check
Backtesting only with currently existing stocks creates survivorship bias
Recommended: Use datasets that include delisted/merged stocks
"""
warnings = []
# Warn if testing only with currently listed stocks
if all(yf.Ticker(t).info.get("marketCap", 0) > 0 for t in tickers[:5]):
warnings.append(
"Only currently listed stocks are included. "
"Delisted or merged stocks from the past are missing, "
"which may overestimate returns"
)
return {
"ticker_count": len(tickers),
"warnings": warnings,
"recommendation": "Use point-in-time datasets (e.g., CRSP, Sharadar)",
}
Look-Ahead Bias Prevention
def validate_no_lookahead(strategy_code: str) -> list:
"""Look-ahead bias check (static code analysis)"""
warnings = []
# Check for patterns that reference future data
dangerous_patterns = [
("shift(-", "Future data reference detected via shift(-N)"),
(".iloc[-1]", "Last row reference - may be a future reference depending on context"),
("resample", "Resampling may include future data"),
]
for pattern, description in dangerous_patterns:
if pattern in strategy_code:
warnings.append(f"Warning: {description} - '{pattern}' found")
if not warnings:
warnings.append("No explicit look-ahead bias patterns detected")
return warnings
Live Trading Considerations
Slippage and Transaction Costs
Backtesting assumes ideal fill prices, but in practice slippage and transaction costs occur.
class RealisticBacktestConfig:
"""Realistic backtest configuration"""
@staticmethod
def get_config(asset_type: str = "us_equity") -> dict:
"""Realistic transaction cost settings by asset type"""
configs = {
"us_equity": {
"commission": 0.001, # 0.1% commission
"slippage": 0.0005, # 0.05% slippage
"spread": 0.0001, # 0.01% spread (large caps)
"market_impact": 0.0002, # 0.02% market impact
},
"kr_equity": {
"commission": 0.00015, # 0.015% (broker commission)
"tax": 0.0018, # 0.18% (securities transaction tax, 2026)
"slippage": 0.001, # 0.1% slippage
"spread": 0.0005, # 0.05% spread
},
"crypto": {
"commission": 0.001, # 0.1% (maker fee)
"slippage": 0.002, # 0.2% slippage
"spread": 0.001, # 0.1% spread
},
}
return configs.get(asset_type, configs["us_equity"])
@staticmethod
def total_cost_per_trade(config: dict) -> float:
"""Calculate total cost per trade"""
return sum(config.values())
# Check costs
for asset_type in ["us_equity", "kr_equity", "crypto"]:
config = RealisticBacktestConfig.get_config(asset_type)
total = RealisticBacktestConfig.total_cost_per_trade(config)
print(f"{asset_type}: Total cost per trade approx. {total:.3%}")
Operational Notes
Pre-Live Trading Checklist
- Paper Trading: Validate strategy with simulated trading for at least 3 months
- Start Small: Begin with 5-10% of total capital and scale up gradually
- Monitoring System: Build real-time dashboard for positions, P&L, and risk metrics
- Kill Switch: Automatic trading halt logic when daily loss limit is exceeded
- Logging: Detailed logging of all orders, fills, and errors
Psychological Factor Management
- Do not manually override the algorithm even when it records losses
- Anticipate and accept the gap between backtesting and live results
- Experience the maximum drawdown scenario in advance through simulation
- Pre-define maximum operating period and retirement criteria for each strategy
Production Checklist
- [ ] Backtesting completed with at least 5 years of historical data
- [ ] Walk-forward optimization passed to verify no overfitting
- [ ] Survivorship bias and look-ahead bias checks completed
- [ ] Realistic transaction costs applied (commissions, slippage, taxes)
- [ ] Position sizing algorithm applied (Kelly Criterion or fixed fractional)
- [ ] Stop-loss/take-profit logic implemented and tested
- [ ] Paper trading completed for at least 3 months
- [ ] Kill switch logic implemented
- [ ] Real-time monitoring dashboard built
- [ ] Trade logging and performance reports auto-generated
- [ ] Network failure and API error handling logic implemented
- [ ] Tax and regulatory requirements verified