- Published on
AutoML Complete Guide: Automated ML Pipelines with AutoGluon, FLAML, and Optuna
- Authors

- Name
- Youngju Kim
- @fjvbn20031
1. AutoML Overview
What is AutoML?
AutoML (Automated Machine Learning) automates various stages of the machine learning pipeline. Tasks that data scientists previously performed manually — data preprocessing, feature engineering, model selection, hyperparameter optimization, and ensembling — are handled automatically by algorithms.
What AutoML Automates:
-
Data Preprocessing Automation
- Missing value imputation strategy selection
- Scaling and normalization method selection
- Outlier handling
-
Feature Engineering Automation
- Feature transformations (log, square, interactions)
- Categorical encoding method selection
- Feature selection and generation
-
Model Selection (Algorithm Selection)
- Searching over diverse algorithms
- Meta-learning (leveraging experience from prior tasks)
-
Hyperparameter Optimization (HPO)
- Grid/random search
- Bayesian optimization
- Evolutionary algorithms
-
Ensemble Automation
- Searching for the optimal ensemble configuration
- Automated stacking and blending
-
Neural Architecture Search (NAS)
- Automated design of optimal neural network architectures
AutoML Application Domains
Industry Applications:
- Finance: Credit risk models, automated fraud detection
- Healthcare: Rapid prototyping of diagnostic support systems
- Retail: Automated demand forecasting model refresh
- Manufacturing: Quality control model automation
Major Open-Source AutoML Tools:
| Tool | Developer | Strengths |
|---|---|---|
| AutoGluon | Amazon | Multimodal, tabular, image, text |
| FLAML | Microsoft | Cost-efficient, fast |
| Optuna | Preferred Networks | HPO, visualization |
| H2O AutoML | H2O.ai | Enterprise, interpretable |
| Auto-sklearn | AutoML Group | scikit-learn compatible |
| Ray Tune | Anyscale | Distributed HPO |
| NNI | Microsoft | NAS, HPO |
Pros and Cons of AutoML
Pros:
- Enables non-experts to build high-quality models
- Saves time by automating repetitive experiments
- Discovers hyperparameter combinations humans might miss
- Provides reproducible pipelines
Cons:
- Computational costs can be very high
- Limited ability to incorporate domain knowledge
- Black-box nature (internal workings difficult to understand)
- Custom solutions are more effective for specialized problems
- Risk of data leakage
2. Hyperparameter Optimization (HPO)
Grid Search
The simplest HPO method — exhaustively tries every combination in the search space.
from sklearn.model_selection import GridSearchCV
import xgboost as xgb
def grid_search_example(X_train, y_train):
"""Grid Search: exhaustive (only practical for small search spaces)"""
param_grid = {
'max_depth': [3, 5, 7],
'learning_rate': [0.01, 0.1, 0.3],
'n_estimators': [100, 300, 500],
'subsample': [0.7, 0.9],
}
# Total combinations: 3 * 3 * 3 * 2 = 54 * CV folds
model = xgb.XGBClassifier(random_state=42, n_jobs=-1)
grid_search = GridSearchCV(
model, param_grid,
cv=5, scoring='roc_auc', n_jobs=-1, verbose=1, refit=True
)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")
results = pd.DataFrame(grid_search.cv_results_)
print(results.sort_values('mean_test_score', ascending=False)[
['params', 'mean_test_score', 'std_test_score']
].head(10))
return grid_search.best_estimator_
Random Search
Proposed by Bergstra & Bengio (2012) — samples randomly from parameter distributions, which is often far more efficient than grid search.
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint, loguniform
def random_search_example(X_train, y_train, n_iter=100):
"""Random Search: sample from continuous distributions"""
param_distributions = {
'max_depth': randint(3, 10),
'learning_rate': loguniform(1e-3, 0.5), # log-uniform over [0.001, 0.5]
'n_estimators': randint(100, 1000),
'subsample': uniform(0.6, 0.4), # uniform over [0.6, 1.0]
'colsample_bytree': uniform(0.6, 0.4),
'reg_alpha': loguniform(1e-4, 10),
'reg_lambda': loguniform(1e-4, 10),
'min_child_weight': randint(1, 10),
'gamma': uniform(0, 0.5),
}
model = xgb.XGBClassifier(random_state=42, n_jobs=-1)
random_search = RandomizedSearchCV(
model, param_distributions,
n_iter=n_iter, cv=5, scoring='roc_auc',
n_jobs=-1, verbose=1, random_state=42, refit=True
)
random_search.fit(X_train, y_train)
print(f"Best params: {random_search.best_params_}")
print(f"Best CV score: {random_search.best_score_:.4f}")
return random_search.best_estimator_
Bayesian Optimization
Bayesian optimization uses the results of previous evaluations to intelligently select the next point to evaluate.
Core Components:
- Surrogate Model: Probabilistic approximation of the objective function (typically Gaussian Process)
- Acquisition Function: Determines the next evaluation point
- EI (Expected Improvement): Expected improvement over the current best
- UCB (Upper Confidence Bound): Balance between exploration and exploitation
- PI (Probability of Improvement): Probability of improving over the current best
TPE (Tree-structured Parzen Estimator):
- Default algorithm used by Optuna
- Models two density functions l(x) and g(x) instead of p(x|y)
- l(x) models the distribution of parameters that led to good results (top gamma%)
- g(x) models the rest
- The next point maximizes the l(x)/g(x) ratio
3. Optuna
Core Concepts
Optuna, developed by Preferred Networks, is a Python-native HPO framework known for its simplicity and flexibility.
Key concepts:
- Study: The entire optimization experiment (a collection of Trials)
- Trial: A single hyperparameter configuration attempt
- Objective Function: The function to optimize (minimize or maximize)
- Sampler: The parameter suggestion algorithm (TPE, CMA-ES, Random, etc.)
- Pruner: Early termination of unpromising Trials
pip install optuna optuna-dashboard
import optuna
from optuna.samplers import TPESampler, CmaEsSampler, RandomSampler
from optuna.pruners import MedianPruner, HyperbandPruner
import lightgbm as lgb
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import numpy as np
optuna.logging.set_verbosity(optuna.logging.WARNING)
def objective_lgbm(trial, X, y):
"""Optuna objective function for LightGBM optimization"""
params = {
'objective': 'binary',
'metric': 'auc',
'verbosity': -1,
'boosting_type': trial.suggest_categorical('boosting_type', ['gbdt', 'dart']),
'num_leaves': trial.suggest_int('num_leaves', 20, 300),
'max_depth': trial.suggest_int('max_depth', 3, 12),
'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
'learning_rate': trial.suggest_float('learning_rate', 1e-4, 0.3, log=True),
'n_estimators': trial.suggest_int('n_estimators', 100, 2000),
'subsample': trial.suggest_float('subsample', 0.5, 1.0),
'subsample_freq': trial.suggest_int('subsample_freq', 1, 7),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),
'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),
'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),
'min_split_gain': trial.suggest_float('min_split_gain', 0, 1),
'n_jobs': -1,
}
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = []
for fold, (train_idx, val_idx) in enumerate(skf.split(X, y)):
X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
train_data = lgb.Dataset(X_train, y_train)
val_data = lgb.Dataset(X_val, y_val, reference=train_data)
model = lgb.train(
params, train_data,
num_boost_round=params['n_estimators'],
valid_sets=[val_data],
callbacks=[
lgb.early_stopping(stopping_rounds=50, verbose=False),
lgb.log_evaluation(-1),
],
)
preds = model.predict(X_val)
fold_score = roc_auc_score(y_val, preds)
cv_scores.append(fold_score)
# Report intermediate results for pruning
trial.report(fold_score, fold)
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
return np.mean(cv_scores)
def run_optuna_study(X, y, n_trials=100, n_jobs=1):
"""Run an Optuna study with TPE sampler and median pruning"""
sampler = TPESampler(
n_startup_trials=20,
n_ei_candidates=24,
multivariate=True,
seed=42
)
pruner = MedianPruner(
n_startup_trials=5,
n_warmup_steps=10,
interval_steps=1
)
study = optuna.create_study(
direction='maximize',
sampler=sampler,
pruner=pruner,
study_name='lgbm_optimization',
# storage='sqlite:///optuna.db', # persist results
# load_if_exists=True, # resume existing study
)
study.optimize(
lambda trial: objective_lgbm(trial, X, y),
n_trials=n_trials,
n_jobs=n_jobs,
show_progress_bar=True,
)
print(f"\nBest params:")
for key, value in study.best_params.items():
print(f" {key}: {value}")
print(f"Best AUC: {study.best_value:.4f}")
print(f"Completed trials: {len(study.trials)}")
pruned = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
print(f"Pruned trials: {len(pruned)}")
return study
def visualize_optuna_study(study):
"""Visualize Optuna optimization results"""
import optuna.visualization as vis
vis.plot_optimization_history(study).show()
vis.plot_param_importances(study).show()
vis.plot_parallel_coordinate(study).show()
vis.plot_slice(study).show()
vis.plot_contour(study, params=['learning_rate', 'num_leaves']).show()
Complete PyTorch + Optuna Example
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import optuna
def create_model(trial, input_dim):
"""Dynamically build a neural network from Optuna trial parameters"""
n_layers = trial.suggest_int('n_layers', 1, 4)
dropout = trial.suggest_float('dropout', 0.1, 0.5)
activation_name = trial.suggest_categorical('activation', ['relu', 'tanh', 'elu'])
activation_map = {'relu': nn.ReLU(), 'tanh': nn.Tanh(), 'elu': nn.ELU()}
layers = []
in_features = input_dim
for i in range(n_layers):
out_features = trial.suggest_int(f'n_units_l{i}', 32, 512)
layers.extend([
nn.Linear(in_features, out_features),
nn.BatchNorm1d(out_features),
activation_map[activation_name],
nn.Dropout(dropout),
])
in_features = out_features
layers.extend([nn.Linear(in_features, 1), nn.Sigmoid()])
return nn.Sequential(*layers)
def objective_pytorch(trial, X_train_t, y_train_t, X_val_t, y_val_t, input_dim):
"""Optuna objective function for PyTorch neural network"""
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = create_model(trial, input_dim).to(device)
lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True)
optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'RMSprop', 'SGD'])
batch_size = trial.suggest_categorical('batch_size', [32, 64, 128, 256])
weight_decay = trial.suggest_float('weight_decay', 1e-8, 1e-2, log=True)
if optimizer_name == 'Adam':
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
elif optimizer_name == 'RMSprop':
optimizer = optim.RMSprop(model.parameters(), lr=lr, weight_decay=weight_decay)
else:
optimizer = optim.SGD(model.parameters(), lr=lr, weight_decay=weight_decay,
momentum=0.9)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)
criterion = nn.BCELoss()
train_dataset = TensorDataset(
X_train_t.to(device),
y_train_t.to(device).float().unsqueeze(1)
)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
best_val_loss = float('inf')
patience_counter = 0
patience = 10
for epoch in range(100):
model.train()
for X_batch, y_batch in train_loader:
optimizer.zero_grad()
loss = criterion(model(X_batch), y_batch)
loss.backward()
optimizer.step()
scheduler.step()
model.eval()
with torch.no_grad():
val_loss = criterion(
model(X_val_t.to(device)),
y_val_t.to(device).float().unsqueeze(1)
).item()
trial.report(val_loss, epoch)
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
if val_loss < best_val_loss:
best_val_loss = val_loss
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= patience:
break
return best_val_loss
def run_pytorch_optuna(X_train, y_train, X_val, y_val, n_trials=50):
X_train_t = torch.FloatTensor(X_train.values if hasattr(X_train, 'values') else X_train)
y_train_t = torch.FloatTensor(y_train.values if hasattr(y_train, 'values') else y_train)
X_val_t = torch.FloatTensor(X_val.values if hasattr(X_val, 'values') else X_val)
y_val_t = torch.FloatTensor(y_val.values if hasattr(y_val, 'values') else y_val)
input_dim = X_train_t.shape[1]
study = optuna.create_study(
direction='minimize',
pruner=optuna.pruners.HyperbandPruner(
min_resource=5, max_resource=100, reduction_factor=3
),
sampler=TPESampler(seed=42)
)
study.optimize(
lambda trial: objective_pytorch(
trial, X_train_t, y_train_t, X_val_t, y_val_t, input_dim
),
n_trials=n_trials,
show_progress_bar=True,
)
print(f"Best val loss: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")
return study
CMA-ES Sampler
# CMA-ES is more efficient for continuous hyperparameter spaces
study_cmaes = optuna.create_study(
direction='maximize',
sampler=CmaEsSampler(
n_startup_trials=10,
restart_strategy='ipop', # restart strategy for escaping local optima
seed=42
)
)
4. Ray Tune
Distributed HPO with Ray Tune
Ray Tune, developed by Anyscale, handles parallel training across multiple GPUs and nodes automatically.
pip install ray[tune] ray[air]
import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import ASHAScheduler, PopulationBasedTraining
from ray.tune.search.optuna import OptunaSearch
import torch
import torch.nn as nn
import torch.nn.functional as F
ray.init(ignore_reinit_error=True)
def train_with_tune(config, data=None):
"""Training function called by Ray Tune"""
X_train, y_train, X_val, y_val = data
model = nn.Sequential(
nn.Linear(X_train.shape[1], config['hidden_size']),
nn.ReLU(),
nn.Dropout(config['dropout']),
nn.Linear(config['hidden_size'], config['hidden_size'] // 2),
nn.ReLU(),
nn.Linear(config['hidden_size'] // 2, 1),
nn.Sigmoid()
)
optimizer = torch.optim.Adam(
model.parameters(), lr=config['lr'], weight_decay=config['weight_decay']
)
criterion = nn.BCELoss()
X_train_t = torch.FloatTensor(X_train)
y_train_t = torch.FloatTensor(y_train).unsqueeze(1)
X_val_t = torch.FloatTensor(X_val)
y_val_t = torch.FloatTensor(y_val).unsqueeze(1)
for epoch in range(config['max_epochs']):
model.train()
optimizer.zero_grad()
loss = criterion(model(X_train_t), y_train_t)
loss.backward()
optimizer.step()
if epoch % 5 == 0:
model.eval()
with torch.no_grad():
val_loss = criterion(model(X_val_t), y_val_t).item()
tune.report(val_loss=val_loss, training_iteration=epoch)
def run_ray_tune(X_train, y_train, X_val, y_val, num_samples=50):
"""Run distributed HPO with Ray Tune"""
config = {
'hidden_size': tune.choice([64, 128, 256, 512]),
'dropout': tune.uniform(0.1, 0.5),
'lr': tune.loguniform(1e-5, 1e-1),
'weight_decay': tune.loguniform(1e-8, 1e-3),
'max_epochs': tune.choice([50, 100, 200]),
}
# ASHA: Asynchronous Successive Halving Algorithm
scheduler = ASHAScheduler(
metric='val_loss',
mode='min',
max_t=200, # Max epochs
grace_period=10, # Min epochs before pruning
reduction_factor=3,
)
search_alg = OptunaSearch(metric='val_loss', mode='min')
result = tune.run(
tune.with_parameters(
train_with_tune,
data=(X_train, y_train, X_val, y_val)
),
config=config,
num_samples=num_samples,
scheduler=scheduler,
search_alg=search_alg,
progress_reporter=CLIReporter(
metric_columns=['val_loss', 'training_iteration'],
max_progress_rows=10
),
verbose=1,
resources_per_trial={'cpu': 2, 'gpu': 0},
)
best_trial = result.get_best_trial('val_loss', 'min', 'last')
print(f"Best val loss: {best_trial.last_result['val_loss']:.4f}")
print(f"Best config: {best_trial.config}")
return result
def run_pbt(X_train, y_train, X_val, y_val):
"""Population Based Training: dynamically mutate hyperparameters during training"""
pbt_scheduler = PopulationBasedTraining(
time_attr='training_iteration',
metric='val_loss',
mode='min',
perturbation_interval=20,
hyperparam_mutations={
'lr': tune.loguniform(1e-5, 1e-1),
'dropout': tune.uniform(0.1, 0.5),
},
quantile_fraction=0.25, # Replace bottom 25% with top 25%
)
result = tune.run(
tune.with_parameters(
train_with_tune,
data=(X_train, y_train, X_val, y_val)
),
config={
'hidden_size': 256,
'dropout': tune.uniform(0.1, 0.5),
'lr': tune.loguniform(1e-4, 1e-1),
'weight_decay': 1e-5,
'max_epochs': 200,
},
num_samples=8,
scheduler=pbt_scheduler,
verbose=1,
)
return result
5. AutoGluon
AutoGluon Overview
AutoGluon, developed by Amazon, achieves Kaggle-level performance with minimal code — sometimes just 3 lines.
pip install autogluon
Tabular Data (TabularPredictor)
from autogluon.tabular import TabularPredictor
import pandas as pd
def autogluon_tabular_example(train_df, test_df, target_col, eval_metric='roc_auc'):
"""AutoGluon tabular training"""
predictor = TabularPredictor(
label=target_col,
eval_metric=eval_metric,
path='autogluon_models/',
problem_type='binary', # 'binary', 'multiclass', 'regression', 'softclass'
)
predictor.fit(
train_data=train_df,
time_limit=3600,
presets='best_quality', # 'best_quality', 'good_quality', 'medium_quality',
# 'optimize_for_deployment'
excluded_model_types=['KNN'],
verbosity=2,
)
leaderboard = predictor.leaderboard(test_df, silent=True)
print(leaderboard[['model', 'score_test', 'score_val', 'pred_time_test']].head(10))
predictions = predictor.predict(test_df)
pred_proba = predictor.predict_proba(test_df)
feature_importance = predictor.feature_importance(test_df)
print(feature_importance.head(20))
return predictor, predictions, pred_proba
def autogluon_advanced(train_df, test_df, target_col):
"""AutoGluon with custom hyperparameters"""
hyperparameters = {
'GBM': [
{'num_boost_round': 300, 'ag_args': {'name_suffix': 'fast'}},
{'num_boost_round': 1000, 'learning_rate': 0.03,
'ag_args': {'name_suffix': 'slow', 'priority': 0}},
],
'XGB': [{'n_estimators': 300, 'max_depth': 6}],
'CAT': [{'iterations': 500, 'depth': 6}],
'NN_TORCH': [{'num_epochs': 50, 'learning_rate': 1e-3, 'dropout_prob': 0.1}],
'RF': [{'n_estimators': 300}],
}
predictor = TabularPredictor(
label=target_col, eval_metric='roc_auc', path='autogluon_advanced/'
)
predictor.fit(
train_data=train_df,
hyperparameters=hyperparameters,
time_limit=7200,
num_stack_levels=1, # Number of stacking levels
num_bag_folds=5, # Number of CV folds for bagging
num_bag_sets=1, # Number of bagging sets
verbosity=3,
)
return predictor
Multimodal Learning
from autogluon.multimodal import MultiModalPredictor
def autogluon_image_classification(train_df, test_df, label_col):
"""AutoGluon image classification"""
predictor = MultiModalPredictor(label=label_col)
predictor.fit(
train_data=train_df,
time_limit=3600,
hyperparameters={
'model.timm_image.checkpoint_name': 'efficientnet_b4',
'optimization.learning_rate': 1e-4,
'optimization.max_epochs': 20,
}
)
return predictor
def autogluon_multimodal(train_df, test_df, target_col):
"""AutoGluon multimodal: text + tabular features together"""
predictor = MultiModalPredictor(label=target_col, problem_type='binary')
predictor.fit(
train_data=train_df,
time_limit=3600,
hyperparameters={
'model.hf_text.checkpoint_name': 'bert-base-uncased',
}
)
return predictor
6. FLAML
Microsoft FLAML
FLAML (Fast and Lightweight AutoML), developed by Microsoft Research, specializes in cost-efficient automation.
pip install flaml
from flaml import AutoML
import pandas as pd
import numpy as np
def flaml_basic_example(X_train, y_train, X_test, task='classification'):
"""FLAML basic usage"""
automl = AutoML()
automl_settings = {
'time_budget': 300,
'metric': 'roc_auc',
'task': task, # 'classification', 'regression', 'ranking'
'estimator_list': [
'lgbm', 'xgboost', 'catboost',
'rf', 'extra_tree', 'lrl1', 'lrl2', 'kneighbor'
],
'log_file_name': 'flaml_log.log',
'seed': 42,
'n_jobs': -1,
'verbose': 1,
'retrain_full': True, # Retrain final model on all data
'max_iter': 100,
'ensemble': True,
'eval_method': 'cv',
'n_splits': 5,
}
automl.fit(X_train, y_train, **automl_settings)
print(f"Best estimator: {automl.best_estimator}")
print(f"Best loss: {automl.best_loss:.4f}")
print(f"Best config: {automl.best_config}")
print(f"Time to find best model: {automl.time_to_find_best_model:.1f}s")
predictions = automl.predict(X_test)
pred_proba = automl.predict_proba(X_test)
return automl, predictions, pred_proba
def flaml_sklearn_pipeline(X_train, y_train, X_test):
"""Integrate FLAML into a scikit-learn Pipeline"""
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
automl = AutoML()
pipeline = Pipeline([
('scaler', StandardScaler()),
('automl', automl),
])
pipeline.fit(
X_train, y_train,
automl__time_budget=120,
automl__metric='roc_auc',
automl__task='classification',
)
return pipeline
def flaml_custom_objective(X_train, y_train):
"""FLAML with a custom evaluation metric"""
def custom_metric(
X_val, y_val, estimator, labels, X_train, y_train,
weight_val=None, weight_train=None, *args
):
"""Optimize F-beta score"""
from sklearn.metrics import fbeta_score
y_pred = estimator.predict(X_val)
score = fbeta_score(y_val, y_pred, beta=2, average='weighted')
return -score, {'f2_score': score} # (loss, metrics_dict)
automl = AutoML()
automl.fit(
X_train, y_train,
metric=custom_metric,
task='classification',
time_budget=120,
)
return automl
7. H2O AutoML
H2O Cluster
H2O AutoML is an enterprise-grade AutoML platform with extensive interpretability tools.
pip install h2o
import h2o
from h2o.automl import H2OAutoML
import pandas as pd
def h2o_automl_example(train_df, test_df, target_col, max_models=20):
"""H2O AutoML end-to-end example"""
h2o.init(nthreads=-1, max_mem_size='8G', port=54321)
train_h2o = h2o.H2OFrame(train_df)
test_h2o = h2o.H2OFrame(test_df)
# Mark target as factor for classification
train_h2o[target_col] = train_h2o[target_col].asfactor()
feature_cols = [col for col in train_df.columns if col != target_col]
aml = H2OAutoML(
max_models=max_models,
max_runtime_secs=3600,
seed=42,
sort_metric='AUC',
balance_classes=False,
include_algos=[
'GBM', 'GLM', 'DRF', 'DeepLearning',
'StackedEnsemble', 'XGBoost'
],
keep_cross_validation_predictions=True,
keep_cross_validation_models=True,
nfolds=5,
verbosity='info',
)
aml.fit(
x=feature_cols, y=target_col,
training_frame=train_h2o,
leaderboard_frame=test_h2o,
)
lb = aml.leaderboard
print("H2O AutoML Leaderboard:")
print(lb.head(20))
best_model = aml.leader
print(f"\nBest model: {best_model.model_id}")
predictions = best_model.predict(test_h2o).as_data_frame()
# Save model
model_path = h2o.save_model(model=best_model, path='h2o_models/', force=True)
print(f"Model saved to: {model_path}")
return aml, best_model, predictions
def cleanup_h2o():
h2o.cluster().shutdown()
8. Neural Architecture Search (NAS)
NAS Overview
Neural Architecture Search (NAS) automatically finds optimal neural network architectures.
Three components of NAS:
- Search Space: The set of possible architectures
- Search Strategy: How to explore the space (random, evolutionary, RL, gradient-based)
- Performance Estimation: How to evaluate candidate architectures
DARTS (Differentiable Architecture Search)
DARTS (Liu et al., 2019) makes architecture search differentiable via continuous relaxation of discrete choices.
import torch
import torch.nn as nn
import torch.nn.functional as F
class MixedOperation(nn.Module):
"""DARTS mixed operation: weighted sum of candidate ops"""
def __init__(self, operations):
super().__init__()
self.ops = nn.ModuleList(operations)
self.alphas = nn.Parameter(torch.randn(len(operations)))
def forward(self, x):
weights = F.softmax(self.alphas, dim=0)
return sum(w * op(x) for w, op in zip(weights, self.ops))
class DARTSCell(nn.Module):
"""A single DARTS cell"""
def __init__(self, in_channels, out_channels):
super().__init__()
operations = [
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.Conv2d(in_channels, out_channels, 5, padding=2),
nn.MaxPool2d(3, stride=1, padding=1),
nn.AvgPool2d(3, stride=1, padding=1),
nn.Identity() if in_channels == out_channels
else nn.Conv2d(in_channels, out_channels, 1),
]
self.mixed_op = MixedOperation(operations)
self.bn = nn.BatchNorm2d(out_channels)
def forward(self, x):
return F.relu(self.bn(self.mixed_op(x)))
class SimpleDARTS(nn.Module):
"""Simplified DARTS network"""
def __init__(self, num_classes=10, num_cells=6):
super().__init__()
self.stem = nn.Conv2d(3, 64, 3, padding=1)
self.cells = nn.ModuleList([DARTSCell(64, 64) for _ in range(num_cells)])
self.classifier = nn.Linear(64, num_classes)
def forward(self, x):
x = self.stem(x)
for cell in self.cells:
x = cell(x)
x = x.mean([2, 3]) # Global average pooling
return self.classifier(x)
def arch_parameters(self):
return [p for n, p in self.named_parameters() if 'alphas' in n]
def model_parameters(self):
return [p for n, p in self.named_parameters() if 'alphas' not in n]
def train_darts(model, train_loader, val_loader, epochs=50):
"""Bilevel optimization for DARTS"""
w_optimizer = torch.optim.SGD(
model.model_parameters(), lr=0.025, momentum=0.9, weight_decay=3e-4
)
a_optimizer = torch.optim.Adam(
model.arch_parameters(), lr=3e-4, betas=(0.5, 0.999), weight_decay=1e-3
)
w_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(w_optimizer, T_max=epochs)
for epoch in range(epochs):
model.train()
train_iter = iter(train_loader)
val_iter = iter(val_loader)
for step in range(min(len(train_loader), len(val_loader))):
# Step 1: Update architecture parameters using validation data
try:
X_val, y_val = next(val_iter)
except StopIteration:
val_iter = iter(val_loader)
X_val, y_val = next(val_iter)
a_optimizer.zero_grad()
val_loss = F.cross_entropy(model(X_val), y_val)
val_loss.backward()
a_optimizer.step()
# Step 2: Update weight parameters using training data
X_train, y_train = next(train_iter)
w_optimizer.zero_grad()
train_loss = F.cross_entropy(model(X_train), y_train)
train_loss.backward()
nn.utils.clip_grad_norm_(model.model_parameters(), 5.0)
w_optimizer.step()
w_scheduler.step()
if epoch % 10 == 0:
print(f"Epoch {epoch}: Train Loss = {train_loss.item():.4f}")
# Extract discovered architecture
for i, cell in enumerate(model.cells):
weights = F.softmax(cell.mixed_op.alphas, dim=0).detach()
best_op = weights.argmax().item()
print(f"Cell {i}: Best op index = {best_op}, weights = {weights.numpy()}")
return model
One-Shot NAS
class SuperNetwork(nn.Module):
"""One-Shot NAS: sample sub-networks from a single super-network"""
def __init__(self, num_classes=10, max_channels=256):
super().__init__()
self.max_channels = max_channels
self.channel_options = [64, 128, 256]
self.conv1 = nn.Conv2d(3, max_channels, 3, padding=1)
self.conv2 = nn.Conv2d(max_channels, max_channels, 3, padding=1)
self.conv3 = nn.Conv2d(max_channels, max_channels, 3, padding=1)
self.bn1 = nn.BatchNorm2d(max_channels)
self.bn2 = nn.BatchNorm2d(max_channels)
self.bn3 = nn.BatchNorm2d(max_channels)
self.classifier = nn.Linear(max_channels, num_classes)
def forward(self, x, arch_config=None):
if arch_config is None:
arch_config = {
'conv1_out': torch.randint(0, len(self.channel_options), (1,)).item(),
'conv2_out': torch.randint(0, len(self.channel_options), (1,)).item(),
}
c1 = self.channel_options[arch_config['conv1_out']]
c2 = self.channel_options[arch_config['conv2_out']]
x = F.relu(self.bn1(self.conv1(x)[:, :c1]))
x = F.relu(self.bn2(self.conv2(
F.pad(x, (0, 0, 0, 0, 0, self.max_channels - c1))
)[:, :c2]))
x = F.relu(self.bn3(self.conv3(
F.pad(x, (0, 0, 0, 0, 0, self.max_channels - c2))
)))
x = x.mean([2, 3])
return self.classifier(x)
9. Pipeline Automation
Auto-sklearn
pip install auto-sklearn
import autosklearn.classification
import autosklearn.regression
from autosklearn.metrics import roc_auc, mean_squared_error
def auto_sklearn_example(X_train, y_train, X_test, task='classification'):
"""Auto-sklearn: scikit-learn-compatible AutoML"""
if task == 'classification':
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=3600,
per_run_time_limit=360,
n_jobs=-1,
memory_limit=8192,
ensemble_size=50,
ensemble_nbest=50,
max_models_on_disc=50,
include={
'classifier': [
'random_forest', 'gradient_boosting',
'extra_trees', 'liblinear_svc'
]
},
metric=roc_auc,
resampling_strategy='cv',
resampling_strategy_arguments={'folds': 5},
seed=42,
)
else:
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=3600,
per_run_time_limit=360,
n_jobs=-1,
metric=mean_squared_error,
seed=42,
)
automl.fit(X_train, y_train)
print(automl.sprint_statistics())
print(automl.leaderboard())
predictions = automl.predict(X_test)
return automl, predictions
10. AutoML in the LLM Era
Leveraging LLMs for AutoML
Large Language Models (LLMs) are opening new possibilities for AutoML:
- Hyperparameter suggestion: LLMs recommend starting configurations based on dataset characteristics
- Feature engineering: LLMs use domain knowledge to suggest new feature ideas
- Code generation: Automatically generate preprocessing and training code
- Error debugging: Diagnose training failures and suggest solutions
# LLM-guided hyperparameter optimization (conceptual code)
from openai import OpenAI
def llm_hyperparameter_suggestion(dataset_description, model_type, previous_results=None):
"""Use LLM to suggest hyperparameters"""
client = OpenAI()
prompt = f"""
Dataset characteristics:
{dataset_description}
Model type: {model_type}
Previous results:
{previous_results if previous_results else 'None (first attempt)'}
Based on this information, suggest optimal hyperparameters for {model_type} in JSON format.
"""
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system",
"content": "You are a machine learning expert. Help optimize hyperparameters."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"}
)
return response.choices[0].message.content
# AutoML Agent (experimental)
class AutoMLAgent:
"""LLM-guided AutoML agent"""
def __init__(self, llm_client, X_train, y_train, X_val, y_val, max_iterations=10):
self.client = llm_client
self.X_train = X_train
self.y_train = y_train
self.X_val = X_val
self.y_val = y_val
self.max_iterations = max_iterations
self.history = []
self.best_score = 0
self.best_params = None
def get_next_config(self):
"""Ask the LLM for the next configuration to try"""
history_str = "\n".join([
f"Iteration {i+1}: params={h['params']}, score={h['score']:.4f}"
for i, h in enumerate(self.history[-5:])
])
prompt = f"""
LightGBM parameter attempts so far:
{history_str if history_str else 'None (first attempt)'}
Suggest the next parameter combination to try in JSON format.
Valid ranges: num_leaves(10-300), learning_rate(0.001-0.3),
n_estimators(100-2000), subsample(0.5-1.0), colsample_bytree(0.5-1.0)
"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are an HPO expert."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"}
)
import json
return json.loads(response.choices[0].message.content)
def evaluate(self, params):
"""Evaluate a parameter configuration"""
import lightgbm as lgb
from sklearn.metrics import roc_auc_score
model = lgb.LGBMClassifier(**params, random_state=42, verbose=-1)
model.fit(self.X_train, self.y_train)
preds = model.predict_proba(self.X_val)[:, 1]
return roc_auc_score(self.y_val, preds)
def run(self):
"""Run the AutoML agent loop"""
for i in range(self.max_iterations):
config = self.get_next_config()
score = self.evaluate(config)
self.history.append({'params': config, 'score': score})
if score > self.best_score:
self.best_score = score
self.best_params = config
print(f"Iteration {i+1}: New best score {score:.4f}")
print(f"\nBest score: {self.best_score:.4f}")
print(f"Best params: {self.best_params}")
return self.best_params
Conclusion
This guide covered the complete AutoML ecosystem:
- Hyperparameter Optimization: From grid search to Bayesian optimization, building systematic intuition
- Optuna: The most flexible Python-native HPO framework, with pruning and visualization
- Ray Tune: Large-scale distributed HPO across multiple GPUs and nodes
- AutoGluon: Amazon's powerful multimodal AutoML for tabular, image, and text data
- FLAML: Microsoft's cost-efficient AutoML with minimal overhead
- H2O AutoML: Enterprise-grade AutoML with interpretability tooling
- NAS: Automated design of optimal neural architectures with DARTS and one-shot methods
- LLM + AutoML: The next frontier of intelligent, language-guided automation
Key Recommendations:
- Under time constraints: use FLAML or AutoGluon with the
good_qualitypreset - Tuning a specific model: use Optuna
- Large-scale or distributed experiments: use Ray Tune
- Enterprise environments: leverage H2O AutoML for its interpretability tools
- LLM-based AutoML is still research-stage but is worth watching closely
AutoML is a tool, not magic. Domain knowledge, data quality, and a correct evaluation framework remain the most critical ingredients for success.
References
- Optuna Documentation
- AutoGluon Documentation
- FLAML Documentation
- H2O AutoML
- Ray Tune Documentation
- DARTS: Differentiable Architecture Search
- Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization.
- Feurer, M., et al. (2015). Efficient and Robust Automated Machine Learning (Auto-sklearn).
- He, X., et al. (2021). AutoML: A Survey of the State-of-the-Art.