Skip to content
Published on

AI Finance & Quant Trading: FinBERT, Reinforcement Learning, and Backtesting

Authors

1. Financial Data Collection and Preprocessing

The foundation of quantitative trading is data. Beyond OHLCV (Open, High, Low, Close, Volume), the scope now extends to order book snapshots, tick data, and alternative data such as news feeds and satellite imagery.

Downloading Stock Data with yfinance

import yfinance as yf
import pandas as pd

# Download daily OHLCV for multiple tickers
tickers = ["AAPL", "MSFT", "GOOGL", "NVDA"]
df = yf.download(tickers, start="2020-01-01", end="2026-01-01", auto_adjust=True)

# Flatten MultiIndex → per-ticker DataFrames
close  = df["Close"]
volume = df["Volume"]

# Handle missing data: forward fill then drop leading NaNs
close = close.ffill().dropna()

print(close.tail())

Fetching Crypto Order Books with ccxt

import ccxt

exchange = ccxt.binance()

symbol   = "BTC/USDT"
orderbook = exchange.fetch_order_book(symbol, limit=20)

bids = orderbook["bids"][:5]   # top-5 [price, qty] bid levels
asks = orderbook["asks"][:5]   # top-5 [price, qty] ask levels

mid_price  = (bids[0][0] + asks[0][0]) / 2
spread_bps = (asks[0][0] - bids[0][0]) / mid_price * 10000

print(f"Mid: {mid_price:.2f}, Spread: {spread_bps:.2f} bps")

Alternative Data: News Headline Collection

News and social-media data provide natural language alpha that structured price data cannot capture.

import requests
from datetime import datetime, timedelta

API_KEY  = "YOUR_NEWSAPI_KEY"
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")

url = (
    f"https://newsapi.org/v2/everything"
    f"?q=NVIDIA+earnings&from={yesterday}&sortBy=publishedAt"
    f"&language=en&apiKey={API_KEY}"
)
resp      = requests.get(url).json()
headlines = [art["title"] for art in resp.get("articles", [])]
print(headlines[:5])

2. Technical Analysis Automation

TA-Lib and pandas-ta let you compute hundreds of technical indicators in a single Python call.

RSI / MACD Calculation

import talib
import numpy as np
import yfinance as yf

df    = yf.download("AAPL", start="2023-01-01", end="2026-01-01", auto_adjust=True)
close = df["Close"].squeeze().values.astype(float)

# RSI (14-period)
rsi = talib.RSI(close, timeperiod=14)

# MACD
macd, signal, hist = talib.MACD(close, fastperiod=12, slowperiod=26, signalperiod=9)

# pandas-ta alternative (no TA-Lib dependency)
import pandas_ta as ta
df_ta = df["Close"].squeeze().to_frame("close")
df_ta.ta.rsi(length=14, append=True)
df_ta.ta.macd(fast=12, slow=26, signal=9, append=True)
print(df_ta.tail())

Candlestick Pattern Recognition

open_p  = df["Open"].squeeze().values.astype(float)
high_p  = df["High"].squeeze().values.astype(float)
low_p   = df["Low"].squeeze().values.astype(float)
close_p = df["Close"].squeeze().values.astype(float)

hammer       = talib.CDLHAMMER(open_p, high_p, low_p, close_p)
engulfing    = talib.CDLENGULFING(open_p, high_p, low_p, close_p)
morning_star = talib.CDLMORNINGSTAR(open_p, high_p, low_p, close_p)

# Returns 100 (bullish) / -100 (bearish) / 0 (no pattern)
print("Hammer signals detected:", (hammer != 0).sum())

3. ML Trading Strategy: XGBoost Alpha Factors

Feature Engineering

import pandas as pd
import numpy as np
import yfinance as yf
import pandas_ta as ta

df = yf.download("SPY", start="2018-01-01", end="2026-01-01", auto_adjust=True)
df.columns = df.columns.droplevel(1) if df.columns.nlevels > 1 else df.columns
df.columns = [c.lower() for c in df.columns]

# Return features
df["ret_1d"]  = df["close"].pct_change(1)
df["ret_5d"]  = df["close"].pct_change(5)
df["ret_20d"] = df["close"].pct_change(20)

# Volatility feature
df["vol_20d"] = df["ret_1d"].rolling(20).std()

# Technical indicator features
df.ta.rsi(length=14, append=True)
df.ta.macd(fast=12, slow=26, signal=9, append=True)
df.ta.bbands(length=20, append=True)

# Volume feature
df["vol_ratio"] = df["volume"] / df["volume"].rolling(20).mean()

# Target: sign of 5-day forward return (1 = up, 0 = down)
df["target"] = (df["close"].pct_change(5).shift(-5) > 0).astype(int)

df.dropna(inplace=True)
print(df.shape)

XGBoost with Walk-Forward Validation

from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
import warnings
warnings.filterwarnings("ignore")

feature_cols = [
    "ret_1d", "ret_5d", "ret_20d", "vol_20d",
    "RSI_14", "MACD_12_26_9", "MACDs_12_26_9",
    "BBL_20_2.0", "BBM_20_2.0", "BBU_20_2.0",
    "vol_ratio"
]
target_col = "target"

results = []
train_years  = 2
test_months  = 3

dates      = df.index
start_year = dates[0].year + train_years

for year in range(start_year, 2026):
    for q in range(1, 5):
        train_end  = pd.Timestamp(f"{year}-{(q-1)*3+1:02d}-01") if q > 1 else pd.Timestamp(f"{year}-01-01")
        test_start = train_end
        test_end   = test_start + pd.DateOffset(months=test_months)

        train_df = df[df.index < test_start].tail(504)
        test_df  = df[(df.index >= test_start) & (df.index < test_end)]

        if len(train_df) < 100 or len(test_df) < 10:
            continue

        X_train, y_train = train_df[feature_cols], train_df[target_col]
        X_test,  y_test  = test_df[feature_cols],  test_df[target_col]

        model = XGBClassifier(
            n_estimators=200, max_depth=4,
            learning_rate=0.05, subsample=0.8,
            eval_metric="logloss", random_state=42
        )
        model.fit(X_train, y_train)

        preds = model.predict(X_test)
        proba = model.predict_proba(X_test)[:, 1]

        acc = accuracy_score(y_test, preds)
        auc = roc_auc_score(y_test, proba)
        results.append({"period": str(test_start.date()), "acc": acc, "auc": auc})

result_df = pd.DataFrame(results)
print(result_df.tail(8))
print(f"\nMean AUC: {result_df['auc'].mean():.4f}")

4. Deep Learning for Finance: LSTM and Temporal Fusion Transformer

LSTM Price Prediction

import numpy as np
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaled = scaler.fit_transform(df[["close"]].values)

SEQ_LEN = 60

def make_sequences(data, seq_len):
    X, y = [], []
    for i in range(len(data) - seq_len):
        X.append(data[i:i+seq_len])
        y.append(data[i+seq_len])
    return np.array(X), np.array(y)

X, y   = make_sequences(scaled, SEQ_LEN)
split  = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)

class LSTMModel(nn.Module):
    def __init__(self, input_size=1, hidden_size=64, num_layers=2):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers,
                            batch_first=True, dropout=0.2)
        self.fc   = nn.Linear(hidden_size, 1)

    def forward(self, x):
        out, _ = self.lstm(x)
        return self.fc(out[:, -1, :])

model     = LSTMModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

for epoch in range(30):
    model.train()
    pred = model(X_train_t)
    loss = criterion(pred, y_train_t)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.6f}")

FinRL Reinforcement Learning Trading Agent

FinRL builds on an OpenAI Gym-style environment to train RL agents for stock trading.

# pip install finrl
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent
import pandas as pd

# Preprocessed financial data in FinRL format
# Required columns: date, tic, open, high, low, close, volume + tech indicators
processed_df = pd.read_csv("processed_stock_data.csv")

env_kwargs = {
    "hmax": 100,               # max shares held per stock
    "initial_amount": 100000,  # starting capital ($)
    "buy_cost_pct": 0.001,     # 0.1% commission
    "sell_cost_pct": 0.001,
    "reward_scaling": 1e-4,
    "state_space": 181,
    "action_space": 30,
    "tech_indicator_list": ["macd", "rsi_30", "cci_30", "dx_30"],
}

train_env = StockTradingEnv(df=processed_df, **env_kwargs)

agent     = DRLAgent(env=train_env)
model_ppo = agent.get_model("ppo")
trained_ppo = agent.train_model(
    model=model_ppo,
    tb_log_name="ppo_stock",
    total_timesteps=50000
)

5. LLM for Finance: FinBERT Sentiment Analysis

FinBERT is a BERT model pre-trained on financial news and earnings call transcripts. It classifies text into Positive, Negative, or Neutral.

from transformers import BertTokenizer, BertForSequenceClassification
import torch
import torch.nn.functional as F

model_name = "ProsusAI/finbert"
tokenizer  = BertTokenizer.from_pretrained(model_name)
model      = BertForSequenceClassification.from_pretrained(model_name)
model.eval()

def finbert_sentiment(texts):
    inputs = tokenizer(texts, padding=True, truncation=True,
                       max_length=512, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits
    probs  = F.softmax(logits, dim=-1).numpy()
    labels = ["positive", "negative", "neutral"]
    return [
        {"text": t, "label": labels[p.argmax()], "score": float(p.max())}
        for t, p in zip(texts, probs)
    ]

headlines = [
    "NVIDIA beats Q4 earnings estimates by 15%, raises guidance",
    "Fed signals higher-for-longer rates amid sticky inflation",
    "Apple reports record services revenue despite iPhone slowdown",
]
results = finbert_sentiment(headlines)
for r in results:
    print(f"[{r['label'].upper():8s}] {r['score']:.3f} | {r['text']}")

Separating Numeric Guidance from Text Tone

Earnings calls contain three distinct signals: (1) quantitative results such as EPS and revenue, (2) forward guidance figures, and (3) management tone in the spoken text. LLMs excel at capturing tone but can misread sentences that embed raw numbers. Parsing numbers via regex or structured NLP and combining them with FinBERT tone scores in an ensemble produces stronger natural-language alpha than either approach alone.


6. Risk Management: VaR, CVaR, and Kelly Criterion

VaR / CVaR Computation

import numpy as np

def calculate_var_cvar(returns, confidence=0.95):
    """
    Historical simulation VaR and CVaR.
    returns: array of daily returns
    """
    sorted_returns = np.sort(returns)
    index = int((1 - confidence) * len(sorted_returns))
    var  = -sorted_returns[index]
    cvar = -sorted_returns[:index].mean()
    return var, cvar

daily_returns = df["close"].pct_change().dropna().values
var_95, cvar_95 = calculate_var_cvar(daily_returns, 0.95)
var_99, cvar_99 = calculate_var_cvar(daily_returns, 0.99)

print(f"VaR  95%: {var_95:.4f}  ({var_95*100:.2f}%)")
print(f"CVaR 95%: {cvar_95:.4f} ({cvar_95*100:.2f}%)")
print(f"VaR  99%: {var_99:.4f}  ({var_99*100:.2f}%)")
print(f"CVaR 99%: {cvar_99:.4f} ({cvar_99*100:.2f}%)")

Key Risk Metric Comparison

MetricFormulaStrengthLimitation
Sharpe Ratio(Rp - Rf) / sigma_pStandardized risk-adjusted returnPenalizes upside volatility equally
Sortino Ratio(Rp - Rf) / sigma_dPenalizes only downside volDenominator less intuitive
Max DrawdownPeak-to-trough lossCaptures extreme lossesIgnores recovery duration
VaR 95%5th percentile lossRegulatory standardUnderestimates tail risk
CVaR 95%Expected loss beyond VaRCaptures tail riskSensitive to distributional assumptions
Calmar RatioCAGR / MDDGrowth vs. drawdownLess meaningful for short periods

Kelly Criterion Position Sizing

def kelly_fraction(win_rate, win_loss_ratio):
    """
    f* = W - (1 - W) / R
    W: win rate, R: average win/loss ratio
    """
    return win_rate - (1 - win_rate) / win_loss_ratio

# Example: 55% win rate, 1.5 win/loss ratio
f_full = kelly_fraction(0.55, 1.5)
f_half = f_full * 0.5   # fractional Kelly reduces variance

print(f"Full Kelly:     {f_full:.2%}")
print(f"Half Kelly:     {f_half:.2%}")

Fractional Kelly (typically 0.25–0.5x) is used in practice because win-rate and edge estimates carry significant estimation error. Full Kelly can produce catastrophic drawdowns when inputs are off, so a safety margin is essential.


7. Backtesting: Vectorbt Strategy Verification

Moving Average Crossover Backtest with Vectorbt

import vectorbt as vbt
import pandas as pd
import yfinance as yf

price = yf.download("SPY", start="2018-01-01", end="2026-01-01",
                    auto_adjust=True)["Close"].squeeze()

fast_ma = vbt.MA.run(price, 20)
slow_ma = vbt.MA.run(price, 60)

entries = fast_ma.ma_crossed_above(slow_ma)
exits   = fast_ma.ma_crossed_below(slow_ma)

portfolio = vbt.Portfolio.from_signals(
    price,
    entries,
    exits,
    init_cash=100_000,
    fees=0.001,        # 0.1% commission
    slippage=0.001,    # 0.1% slippage
    freq="D",
)

stats = portfolio.stats()
print(stats[["Total Return [%]", "Sharpe Ratio", "Max Drawdown [%]",
             "Win Rate [%]", "Profit Factor"]])

Backtesting Bias Checklist

When a backtest produces unexpectedly strong results, always audit these failure modes:

Bias TypeRoot CauseMitigation
Look-ahead biasFuture data used to compute current-period signalsAudit shift(-1) calls; check feature timestamps
Survivorship biasDelisted tickers excluded from universeUse point-in-time universe datasets
Optimization biasIn-sample parameter over-fittingWalk-forward validation, out-of-sample holdout
Market impact ignoredLarge orders assumed to fill at mid priceSlippage model; volume-constrained sizing
Underestimated transaction costsReal spreads and fees excludedRealistic commission + slippage parameters

Quiz

Q1. Why is walk-forward validation more appropriate for financial time series than k-fold cross-validation?

Answer: Financial time series have temporal dependency; randomly splitting folds allows future data to leak into training, creating look-ahead bias that inflates apparent model performance.

Explanation: Walk-forward validation always trains on past data and tests on future data, preserving temporal order. In k-fold, a training fold can contain observations that occur after some test observations, meaning the model effectively "knows the future." Financial returns also exhibit autocorrelation and regime shifts, making temporal ordering of validation essential.

Q2. What are the limitations of the Sharpe ratio, and when is the Sortino ratio more appropriate?

Answer: The Sharpe ratio treats upside and downside volatility identically. A strategy with large positive return spikes is penalized unfairly, making its Sharpe ratio look worse than it deserves.

Explanation: The Sortino ratio replaces the denominator with downside deviation, penalizing only losses that investors actually dislike. It is more appropriate for strategies with asymmetric return distributions — such as momentum, option writing, or trend-following — where upside variance is desirable and should not reduce the risk-adjusted score.

Q3. How does look-ahead bias cause backtest results to be overly optimistic?

Answer: When signals are computed using data that did not exist at the time of the trade — such as the same bar's closing price used to trigger an open-bar entry — the model implicitly knows the future and records artificially high accuracy.

Explanation: Common sources include: using the daily close to generate a same-day entry signal, failing to apply shift(-n) when labeling forward returns, rolling statistics that include the current bar, and exponential moving averages that back-propagate future information. Each instance makes the strategy appear to predict what it actually already observed.

Q4. What is the mathematical justification for Kelly Criterion in position sizing, and why use fractional Kelly?

Answer: The Kelly formula f* = W - (1 - W) / R maximizes the expected log return, which is equivalent to maximizing the long-run geometric growth rate of wealth.

Explanation: By maximizing E[log(wealth)], Kelly provably grows capital faster than any other fixed-fraction strategy over the long run. However, the formula is sensitive to estimation error in W (win rate) and R (win/loss ratio). Overestimating edge leads to over-betting and severe drawdowns. Fractional Kelly (25–50% of f*) sacrifices some asymptotic growth rate for dramatically reduced variance and drawdown, making it far more practical for live trading.

Q5. Why should numeric guidance and text tone be analyzed separately when using LLMs on earnings calls?

Answer: Management often delivers strong headline numbers while guiding conservatively for the next quarter, or presents weak results in reassuring language. Mixing the two signals causes them to cancel out, diluting alpha.

Explanation: Earnings releases contain three distinct information types: (1) realized figures such as EPS and revenue, (2) forward guidance numbers, and (3) qualitative tone in management commentary. LLMs like FinBERT accurately score tone but can misclassify a sentence like "revenue missed by 8%" as neutral or positive depending on surrounding context. Parsing numeric figures with structured extraction and scoring text tone separately — then combining them in a weighted ensemble — produces more accurate and robust natural-language alpha signals.