1. AutoML Overview
What is AutoML?
AutoML (Automated Machine Learning) automates various stages of the machine learning pipeline. Tasks that data scientists previously performed manually — data preprocessing, feature engineering, model selection, hyperparameter optimization, and ensembling — are handled automatically by algorithms.
**What AutoML Automates:**
1. **Data Preprocessing Automation**
- Missing value imputation strategy selection
- Scaling and normalization method selection
- Outlier handling
2. **Feature Engineering Automation**
- Feature transformations (log, square, interactions)
- Categorical encoding method selection
- Feature selection and generation
3. **Model Selection (Algorithm Selection)**
- Searching over diverse algorithms
- Meta-learning (leveraging experience from prior tasks)
4. **Hyperparameter Optimization (HPO)**
- Grid/random search
- Bayesian optimization
- Evolutionary algorithms
5. **Ensemble Automation**
- Searching for the optimal ensemble configuration
- Automated stacking and blending
6. **Neural Architecture Search (NAS)**
- Automated design of optimal neural network architectures
AutoML Application Domains
**Industry Applications:**
- **Finance**: Credit risk models, automated fraud detection
- **Healthcare**: Rapid prototyping of diagnostic support systems
- **Retail**: Automated demand forecasting model refresh
- **Manufacturing**: Quality control model automation
**Major Open-Source AutoML Tools:**
| Tool | Developer | Strengths |
| ------------ | ------------------ | -------------------------------- |
| AutoGluon | Amazon | Multimodal, tabular, image, text |
| FLAML | Microsoft | Cost-efficient, fast |
| Optuna | Preferred Networks | HPO, visualization |
| H2O AutoML | H2O.ai | Enterprise, interpretable |
| Auto-sklearn | AutoML Group | scikit-learn compatible |
| Ray Tune | Anyscale | Distributed HPO |
| NNI | Microsoft | NAS, HPO |
Pros and Cons of AutoML
**Pros:**
- Enables non-experts to build high-quality models
- Saves time by automating repetitive experiments
- Discovers hyperparameter combinations humans might miss
- Provides reproducible pipelines
**Cons:**
- Computational costs can be very high
- Limited ability to incorporate domain knowledge
- Black-box nature (internal workings difficult to understand)
- Custom solutions are more effective for specialized problems
- Risk of data leakage
2. Hyperparameter Optimization (HPO)
Grid Search
The simplest HPO method — exhaustively tries every combination in the search space.
from sklearn.model_selection import GridSearchCV
def grid_search_example(X_train, y_train):
"""Grid Search: exhaustive (only practical for small search spaces)"""
param_grid = {
'max_depth': [3, 5, 7],
'learning_rate': [0.01, 0.1, 0.3],
'n_estimators': [100, 300, 500],
'subsample': [0.7, 0.9],
}
Total combinations: 3 * 3 * 3 * 2 = 54 * CV folds
model = xgb.XGBClassifier(random_state=42, n_jobs=-1)
grid_search = GridSearchCV(
model, param_grid,
cv=5, scoring='roc_auc', n_jobs=-1, verbose=1, refit=True
)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")
results = pd.DataFrame(grid_search.cv_results_)
print(results.sort_values('mean_test_score', ascending=False)[
['params', 'mean_test_score', 'std_test_score']
].head(10))
return grid_search.best_estimator_
Random Search
Proposed by Bergstra & Bengio (2012) — samples randomly from parameter distributions, which is often far more efficient than grid search.
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint, loguniform
def random_search_example(X_train, y_train, n_iter=100):
"""Random Search: sample from continuous distributions"""
param_distributions = {
'max_depth': randint(3, 10),
'learning_rate': loguniform(1e-3, 0.5), # log-uniform over [0.001, 0.5]
'n_estimators': randint(100, 1000),
'subsample': uniform(0.6, 0.4), # uniform over [0.6, 1.0]
'colsample_bytree': uniform(0.6, 0.4),
'reg_alpha': loguniform(1e-4, 10),
'reg_lambda': loguniform(1e-4, 10),
'min_child_weight': randint(1, 10),
'gamma': uniform(0, 0.5),
}
model = xgb.XGBClassifier(random_state=42, n_jobs=-1)
random_search = RandomizedSearchCV(
model, param_distributions,
n_iter=n_iter, cv=5, scoring='roc_auc',
n_jobs=-1, verbose=1, random_state=42, refit=True
)
random_search.fit(X_train, y_train)
print(f"Best params: {random_search.best_params_}")
print(f"Best CV score: {random_search.best_score_:.4f}")
return random_search.best_estimator_
Bayesian Optimization
Bayesian optimization uses the results of previous evaluations to intelligently select the next point to evaluate.
**Core Components:**
1. **Surrogate Model**: Probabilistic approximation of the objective function (typically Gaussian Process)
2. **Acquisition Function**: Determines the next evaluation point
- EI (Expected Improvement): Expected improvement over the current best
- UCB (Upper Confidence Bound): Balance between exploration and exploitation
- PI (Probability of Improvement): Probability of improving over the current best
**TPE (Tree-structured Parzen Estimator):**
- Default algorithm used by Optuna
- Models two density functions l(x) and g(x) instead of p(x|y)
- l(x) models the distribution of parameters that led to good results (top gamma%)
- g(x) models the rest
- The next point maximizes the l(x)/g(x) ratio
3. Optuna
Core Concepts
Optuna, developed by Preferred Networks, is a Python-native HPO framework known for its simplicity and flexibility.
**Key concepts:**
- **Study**: The entire optimization experiment (a collection of Trials)
- **Trial**: A single hyperparameter configuration attempt
- **Objective Function**: The function to optimize (minimize or maximize)
- **Sampler**: The parameter suggestion algorithm (TPE, CMA-ES, Random, etc.)
- **Pruner**: Early termination of unpromising Trials
pip install optuna optuna-dashboard
from optuna.samplers import TPESampler, CmaEsSampler, RandomSampler
from optuna.pruners import MedianPruner, HyperbandPruner
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
optuna.logging.set_verbosity(optuna.logging.WARNING)
def objective_lgbm(trial, X, y):
"""Optuna objective function for LightGBM optimization"""
params = {
'objective': 'binary',
'metric': 'auc',
'verbosity': -1,
'boosting_type': trial.suggest_categorical('boosting_type', ['gbdt', 'dart']),
'num_leaves': trial.suggest_int('num_leaves', 20, 300),
'max_depth': trial.suggest_int('max_depth', 3, 12),
'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
'learning_rate': trial.suggest_float('learning_rate', 1e-4, 0.3, log=True),
'n_estimators': trial.suggest_int('n_estimators', 100, 2000),
'subsample': trial.suggest_float('subsample', 0.5, 1.0),
'subsample_freq': trial.suggest_int('subsample_freq', 1, 7),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),
'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),
'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),
'min_split_gain': trial.suggest_float('min_split_gain', 0, 1),
'n_jobs': -1,
}
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = []
for fold, (train_idx, val_idx) in enumerate(skf.split(X, y)):
X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
train_data = lgb.Dataset(X_train, y_train)
val_data = lgb.Dataset(X_val, y_val, reference=train_data)
model = lgb.train(
params, train_data,
num_boost_round=params['n_estimators'],
valid_sets=[val_data],
callbacks=[
lgb.early_stopping(stopping_rounds=50, verbose=False),
lgb.log_evaluation(-1),
],
)
preds = model.predict(X_val)
fold_score = roc_auc_score(y_val, preds)
cv_scores.append(fold_score)
Report intermediate results for pruning
trial.report(fold_score, fold)
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
return np.mean(cv_scores)
def run_optuna_study(X, y, n_trials=100, n_jobs=1):
"""Run an Optuna study with TPE sampler and median pruning"""
sampler = TPESampler(
n_startup_trials=20,
n_ei_candidates=24,
multivariate=True,
seed=42
)
pruner = MedianPruner(
n_startup_trials=5,
n_warmup_steps=10,
interval_steps=1
)
study = optuna.create_study(
direction='maximize',
sampler=sampler,
pruner=pruner,
study_name='lgbm_optimization',
storage='sqlite:///optuna.db', # persist results
load_if_exists=True, # resume existing study
)
study.optimize(
lambda trial: objective_lgbm(trial, X, y),
n_trials=n_trials,
n_jobs=n_jobs,
show_progress_bar=True,
)
print(f"\nBest params:")
for key, value in study.best_params.items():
print(f" {key}: {value}")
print(f"Best AUC: {study.best_value:.4f}")
print(f"Completed trials: {len(study.trials)}")
pruned = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]
print(f"Pruned trials: {len(pruned)}")
return study
def visualize_optuna_study(study):
"""Visualize Optuna optimization results"""
vis.plot_optimization_history(study).show()
vis.plot_param_importances(study).show()
vis.plot_parallel_coordinate(study).show()
vis.plot_slice(study).show()
vis.plot_contour(study, params=['learning_rate', 'num_leaves']).show()
Complete PyTorch + Optuna Example
from torch.utils.data import DataLoader, TensorDataset
def create_model(trial, input_dim):
"""Dynamically build a neural network from Optuna trial parameters"""
n_layers = trial.suggest_int('n_layers', 1, 4)
dropout = trial.suggest_float('dropout', 0.1, 0.5)
activation_name = trial.suggest_categorical('activation', ['relu', 'tanh', 'elu'])
activation_map = {'relu': nn.ReLU(), 'tanh': nn.Tanh(), 'elu': nn.ELU()}
layers = []
in_features = input_dim
for i in range(n_layers):
out_features = trial.suggest_int(f'n_units_l{i}', 32, 512)
layers.extend([
nn.Linear(in_features, out_features),
nn.BatchNorm1d(out_features),
activation_map[activation_name],
nn.Dropout(dropout),
])
in_features = out_features
layers.extend([nn.Linear(in_features, 1), nn.Sigmoid()])
return nn.Sequential(*layers)
def objective_pytorch(trial, X_train_t, y_train_t, X_val_t, y_val_t, input_dim):
"""Optuna objective function for PyTorch neural network"""
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = create_model(trial, input_dim).to(device)
lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True)
optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'RMSprop', 'SGD'])
batch_size = trial.suggest_categorical('batch_size', [32, 64, 128, 256])
weight_decay = trial.suggest_float('weight_decay', 1e-8, 1e-2, log=True)
if optimizer_name == 'Adam':
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
elif optimizer_name == 'RMSprop':
optimizer = optim.RMSprop(model.parameters(), lr=lr, weight_decay=weight_decay)
else:
optimizer = optim.SGD(model.parameters(), lr=lr, weight_decay=weight_decay,
momentum=0.9)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)
criterion = nn.BCELoss()
train_dataset = TensorDataset(
X_train_t.to(device),
y_train_t.to(device).float().unsqueeze(1)
)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
best_val_loss = float('inf')
patience_counter = 0
patience = 10
for epoch in range(100):
model.train()
for X_batch, y_batch in train_loader:
optimizer.zero_grad()
loss = criterion(model(X_batch), y_batch)
loss.backward()
optimizer.step()
scheduler.step()
model.eval()
with torch.no_grad():
val_loss = criterion(
model(X_val_t.to(device)),
y_val_t.to(device).float().unsqueeze(1)
).item()
trial.report(val_loss, epoch)
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
if val_loss < best_val_loss:
best_val_loss = val_loss
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= patience:
break
return best_val_loss
def run_pytorch_optuna(X_train, y_train, X_val, y_val, n_trials=50):
X_train_t = torch.FloatTensor(X_train.values if hasattr(X_train, 'values') else X_train)
y_train_t = torch.FloatTensor(y_train.values if hasattr(y_train, 'values') else y_train)
X_val_t = torch.FloatTensor(X_val.values if hasattr(X_val, 'values') else X_val)
y_val_t = torch.FloatTensor(y_val.values if hasattr(y_val, 'values') else y_val)
input_dim = X_train_t.shape[1]
study = optuna.create_study(
direction='minimize',
pruner=optuna.pruners.HyperbandPruner(
min_resource=5, max_resource=100, reduction_factor=3
),
sampler=TPESampler(seed=42)
)
study.optimize(
lambda trial: objective_pytorch(
trial, X_train_t, y_train_t, X_val_t, y_val_t, input_dim
),
n_trials=n_trials,
show_progress_bar=True,
)
print(f"Best val loss: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")
return study
CMA-ES Sampler
CMA-ES is more efficient for continuous hyperparameter spaces
study_cmaes = optuna.create_study(
direction='maximize',
sampler=CmaEsSampler(
n_startup_trials=10,
restart_strategy='ipop', # restart strategy for escaping local optima
seed=42
)
)
4. Ray Tune
Distributed HPO with Ray Tune
Ray Tune, developed by Anyscale, handles parallel training across multiple GPUs and nodes automatically.
pip install ray[tune] ray[air]
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import ASHAScheduler, PopulationBasedTraining
from ray.tune.search.optuna import OptunaSearch
ray.init(ignore_reinit_error=True)
def train_with_tune(config, data=None):
"""Training function called by Ray Tune"""
X_train, y_train, X_val, y_val = data
model = nn.Sequential(
nn.Linear(X_train.shape[1], config['hidden_size']),
nn.ReLU(),
nn.Dropout(config['dropout']),
nn.Linear(config['hidden_size'], config['hidden_size'] // 2),
nn.ReLU(),
nn.Linear(config['hidden_size'] // 2, 1),
nn.Sigmoid()
)
optimizer = torch.optim.Adam(
model.parameters(), lr=config['lr'], weight_decay=config['weight_decay']
)
criterion = nn.BCELoss()
X_train_t = torch.FloatTensor(X_train)
y_train_t = torch.FloatTensor(y_train).unsqueeze(1)
X_val_t = torch.FloatTensor(X_val)
y_val_t = torch.FloatTensor(y_val).unsqueeze(1)
for epoch in range(config['max_epochs']):
model.train()
optimizer.zero_grad()
loss = criterion(model(X_train_t), y_train_t)
loss.backward()
optimizer.step()
if epoch % 5 == 0:
model.eval()
with torch.no_grad():
val_loss = criterion(model(X_val_t), y_val_t).item()
tune.report(val_loss=val_loss, training_iteration=epoch)
def run_ray_tune(X_train, y_train, X_val, y_val, num_samples=50):
"""Run distributed HPO with Ray Tune"""
config = {
'hidden_size': tune.choice([64, 128, 256, 512]),
'dropout': tune.uniform(0.1, 0.5),
'lr': tune.loguniform(1e-5, 1e-1),
'weight_decay': tune.loguniform(1e-8, 1e-3),
'max_epochs': tune.choice([50, 100, 200]),
}
ASHA: Asynchronous Successive Halving Algorithm
scheduler = ASHAScheduler(
metric='val_loss',
mode='min',
max_t=200, # Max epochs
grace_period=10, # Min epochs before pruning
reduction_factor=3,
)
search_alg = OptunaSearch(metric='val_loss', mode='min')
result = tune.run(
tune.with_parameters(
train_with_tune,
data=(X_train, y_train, X_val, y_val)
),
config=config,
num_samples=num_samples,
scheduler=scheduler,
search_alg=search_alg,
progress_reporter=CLIReporter(
metric_columns=['val_loss', 'training_iteration'],
max_progress_rows=10
),
verbose=1,
resources_per_trial={'cpu': 2, 'gpu': 0},
)
best_trial = result.get_best_trial('val_loss', 'min', 'last')
print(f"Best val loss: {best_trial.last_result['val_loss']:.4f}")
print(f"Best config: {best_trial.config}")
return result
def run_pbt(X_train, y_train, X_val, y_val):
"""Population Based Training: dynamically mutate hyperparameters during training"""
pbt_scheduler = PopulationBasedTraining(
time_attr='training_iteration',
metric='val_loss',
mode='min',
perturbation_interval=20,
hyperparam_mutations={
'lr': tune.loguniform(1e-5, 1e-1),
'dropout': tune.uniform(0.1, 0.5),
},
quantile_fraction=0.25, # Replace bottom 25% with top 25%
)
result = tune.run(
tune.with_parameters(
train_with_tune,
data=(X_train, y_train, X_val, y_val)
),
config={
'hidden_size': 256,
'dropout': tune.uniform(0.1, 0.5),
'lr': tune.loguniform(1e-4, 1e-1),
'weight_decay': 1e-5,
'max_epochs': 200,
},
num_samples=8,
scheduler=pbt_scheduler,
verbose=1,
)
return result
5. AutoGluon
AutoGluon Overview
AutoGluon, developed by Amazon, achieves Kaggle-level performance with minimal code — sometimes just 3 lines.
pip install autogluon
Tabular Data (TabularPredictor)
from autogluon.tabular import TabularPredictor
def autogluon_tabular_example(train_df, test_df, target_col, eval_metric='roc_auc'):
"""AutoGluon tabular training"""
predictor = TabularPredictor(
label=target_col,
eval_metric=eval_metric,
path='autogluon_models/',
problem_type='binary', # 'binary', 'multiclass', 'regression', 'softclass'
)
predictor.fit(
train_data=train_df,
time_limit=3600,
presets='best_quality', # 'best_quality', 'good_quality', 'medium_quality',
'optimize_for_deployment'
excluded_model_types=['KNN'],
verbosity=2,
)
leaderboard = predictor.leaderboard(test_df, silent=True)
print(leaderboard[['model', 'score_test', 'score_val', 'pred_time_test']].head(10))
predictions = predictor.predict(test_df)
pred_proba = predictor.predict_proba(test_df)
feature_importance = predictor.feature_importance(test_df)
print(feature_importance.head(20))
return predictor, predictions, pred_proba
def autogluon_advanced(train_df, test_df, target_col):
"""AutoGluon with custom hyperparameters"""
hyperparameters = {
'GBM': [
{'num_boost_round': 300, 'ag_args': {'name_suffix': 'fast'}},
{'num_boost_round': 1000, 'learning_rate': 0.03,
'ag_args': {'name_suffix': 'slow', 'priority': 0}},
],
'XGB': [{'n_estimators': 300, 'max_depth': 6}],
'CAT': [{'iterations': 500, 'depth': 6}],
'NN_TORCH': [{'num_epochs': 50, 'learning_rate': 1e-3, 'dropout_prob': 0.1}],
'RF': [{'n_estimators': 300}],
}
predictor = TabularPredictor(
label=target_col, eval_metric='roc_auc', path='autogluon_advanced/'
)
predictor.fit(
train_data=train_df,
hyperparameters=hyperparameters,
time_limit=7200,
num_stack_levels=1, # Number of stacking levels
num_bag_folds=5, # Number of CV folds for bagging
num_bag_sets=1, # Number of bagging sets
verbosity=3,
)
return predictor
Multimodal Learning
from autogluon.multimodal import MultiModalPredictor
def autogluon_image_classification(train_df, test_df, label_col):
"""AutoGluon image classification"""
predictor = MultiModalPredictor(label=label_col)
predictor.fit(
train_data=train_df,
time_limit=3600,
hyperparameters={
'model.timm_image.checkpoint_name': 'efficientnet_b4',
'optimization.learning_rate': 1e-4,
'optimization.max_epochs': 20,
}
)
return predictor
def autogluon_multimodal(train_df, test_df, target_col):
"""AutoGluon multimodal: text + tabular features together"""
predictor = MultiModalPredictor(label=target_col, problem_type='binary')
predictor.fit(
train_data=train_df,
time_limit=3600,
hyperparameters={
'model.hf_text.checkpoint_name': 'bert-base-uncased',
}
)
return predictor
6. FLAML
Microsoft FLAML
FLAML (Fast and Lightweight AutoML), developed by Microsoft Research, specializes in cost-efficient automation.
pip install flaml
from flaml import AutoML
def flaml_basic_example(X_train, y_train, X_test, task='classification'):
"""FLAML basic usage"""
automl = AutoML()
automl_settings = {
'time_budget': 300,
'metric': 'roc_auc',
'task': task, # 'classification', 'regression', 'ranking'
'estimator_list': [
'lgbm', 'xgboost', 'catboost',
'rf', 'extra_tree', 'lrl1', 'lrl2', 'kneighbor'
],
'log_file_name': 'flaml_log.log',
'seed': 42,
'n_jobs': -1,
'verbose': 1,
'retrain_full': True, # Retrain final model on all data
'max_iter': 100,
'ensemble': True,
'eval_method': 'cv',
'n_splits': 5,
}
automl.fit(X_train, y_train, **automl_settings)
print(f"Best estimator: {automl.best_estimator}")
print(f"Best loss: {automl.best_loss:.4f}")
print(f"Best config: {automl.best_config}")
print(f"Time to find best model: {automl.time_to_find_best_model:.1f}s")
predictions = automl.predict(X_test)
pred_proba = automl.predict_proba(X_test)
return automl, predictions, pred_proba
def flaml_sklearn_pipeline(X_train, y_train, X_test):
"""Integrate FLAML into a scikit-learn Pipeline"""
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
automl = AutoML()
pipeline = Pipeline([
('scaler', StandardScaler()),
('automl', automl),
])
pipeline.fit(
X_train, y_train,
automl__time_budget=120,
automl__metric='roc_auc',
automl__task='classification',
)
return pipeline
def flaml_custom_objective(X_train, y_train):
"""FLAML with a custom evaluation metric"""
def custom_metric(
X_val, y_val, estimator, labels, X_train, y_train,
weight_val=None, weight_train=None, *args
):
"""Optimize F-beta score"""
from sklearn.metrics import fbeta_score
y_pred = estimator.predict(X_val)
score = fbeta_score(y_val, y_pred, beta=2, average='weighted')
return -score, {'f2_score': score} # (loss, metrics_dict)
automl = AutoML()
automl.fit(
X_train, y_train,
metric=custom_metric,
task='classification',
time_budget=120,
)
return automl
7. H2O AutoML
H2O Cluster
H2O AutoML is an enterprise-grade AutoML platform with extensive interpretability tools.
pip install h2o
from h2o.automl import H2OAutoML
def h2o_automl_example(train_df, test_df, target_col, max_models=20):
"""H2O AutoML end-to-end example"""
h2o.init(nthreads=-1, max_mem_size='8G', port=54321)
train_h2o = h2o.H2OFrame(train_df)
test_h2o = h2o.H2OFrame(test_df)
Mark target as factor for classification
train_h2o[target_col] = train_h2o[target_col].asfactor()
feature_cols = [col for col in train_df.columns if col != target_col]
aml = H2OAutoML(
max_models=max_models,
max_runtime_secs=3600,
seed=42,
sort_metric='AUC',
balance_classes=False,
include_algos=[
'GBM', 'GLM', 'DRF', 'DeepLearning',
'StackedEnsemble', 'XGBoost'
],
keep_cross_validation_predictions=True,
keep_cross_validation_models=True,
nfolds=5,
verbosity='info',
)
aml.fit(
x=feature_cols, y=target_col,
training_frame=train_h2o,
leaderboard_frame=test_h2o,
)
lb = aml.leaderboard
print("H2O AutoML Leaderboard:")
print(lb.head(20))
best_model = aml.leader
print(f"\nBest model: {best_model.model_id}")
predictions = best_model.predict(test_h2o).as_data_frame()
Save model
model_path = h2o.save_model(model=best_model, path='h2o_models/', force=True)
print(f"Model saved to: {model_path}")
return aml, best_model, predictions
def cleanup_h2o():
h2o.cluster().shutdown()
8. Neural Architecture Search (NAS)
NAS Overview
Neural Architecture Search (NAS) automatically finds optimal neural network architectures.
**Three components of NAS:**
1. **Search Space**: The set of possible architectures
2. **Search Strategy**: How to explore the space (random, evolutionary, RL, gradient-based)
3. **Performance Estimation**: How to evaluate candidate architectures
DARTS (Differentiable Architecture Search)
DARTS (Liu et al., 2019) makes architecture search differentiable via continuous relaxation of discrete choices.
class MixedOperation(nn.Module):
"""DARTS mixed operation: weighted sum of candidate ops"""
def __init__(self, operations):
super().__init__()
self.ops = nn.ModuleList(operations)
self.alphas = nn.Parameter(torch.randn(len(operations)))
def forward(self, x):
weights = F.softmax(self.alphas, dim=0)
return sum(w * op(x) for w, op in zip(weights, self.ops))
class DARTSCell(nn.Module):
"""A single DARTS cell"""
def __init__(self, in_channels, out_channels):
super().__init__()
operations = [
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.Conv2d(in_channels, out_channels, 5, padding=2),
nn.MaxPool2d(3, stride=1, padding=1),
nn.AvgPool2d(3, stride=1, padding=1),
nn.Identity() if in_channels == out_channels
else nn.Conv2d(in_channels, out_channels, 1),
]
self.mixed_op = MixedOperation(operations)
self.bn = nn.BatchNorm2d(out_channels)
def forward(self, x):
return F.relu(self.bn(self.mixed_op(x)))
class SimpleDARTS(nn.Module):
"""Simplified DARTS network"""
def __init__(self, num_classes=10, num_cells=6):
super().__init__()
self.stem = nn.Conv2d(3, 64, 3, padding=1)
self.cells = nn.ModuleList([DARTSCell(64, 64) for _ in range(num_cells)])
self.classifier = nn.Linear(64, num_classes)
def forward(self, x):
x = self.stem(x)
for cell in self.cells:
x = cell(x)
x = x.mean([2, 3]) # Global average pooling
return self.classifier(x)
def arch_parameters(self):
return [p for n, p in self.named_parameters() if 'alphas' in n]
def model_parameters(self):
return [p for n, p in self.named_parameters() if 'alphas' not in n]
def train_darts(model, train_loader, val_loader, epochs=50):
"""Bilevel optimization for DARTS"""
w_optimizer = torch.optim.SGD(
model.model_parameters(), lr=0.025, momentum=0.9, weight_decay=3e-4
)
a_optimizer = torch.optim.Adam(
model.arch_parameters(), lr=3e-4, betas=(0.5, 0.999), weight_decay=1e-3
)
w_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(w_optimizer, T_max=epochs)
for epoch in range(epochs):
model.train()
train_iter = iter(train_loader)
val_iter = iter(val_loader)
for step in range(min(len(train_loader), len(val_loader))):
Step 1: Update architecture parameters using validation data
try:
X_val, y_val = next(val_iter)
except StopIteration:
val_iter = iter(val_loader)
X_val, y_val = next(val_iter)
a_optimizer.zero_grad()
val_loss = F.cross_entropy(model(X_val), y_val)
val_loss.backward()
a_optimizer.step()
Step 2: Update weight parameters using training data
X_train, y_train = next(train_iter)
w_optimizer.zero_grad()
train_loss = F.cross_entropy(model(X_train), y_train)
train_loss.backward()
nn.utils.clip_grad_norm_(model.model_parameters(), 5.0)
w_optimizer.step()
w_scheduler.step()
if epoch % 10 == 0:
print(f"Epoch {epoch}: Train Loss = {train_loss.item():.4f}")
Extract discovered architecture
for i, cell in enumerate(model.cells):
weights = F.softmax(cell.mixed_op.alphas, dim=0).detach()
best_op = weights.argmax().item()
print(f"Cell {i}: Best op index = {best_op}, weights = {weights.numpy()}")
return model
One-Shot NAS
class SuperNetwork(nn.Module):
"""One-Shot NAS: sample sub-networks from a single super-network"""
def __init__(self, num_classes=10, max_channels=256):
super().__init__()
self.max_channels = max_channels
self.channel_options = [64, 128, 256]
self.conv1 = nn.Conv2d(3, max_channels, 3, padding=1)
self.conv2 = nn.Conv2d(max_channels, max_channels, 3, padding=1)
self.conv3 = nn.Conv2d(max_channels, max_channels, 3, padding=1)
self.bn1 = nn.BatchNorm2d(max_channels)
self.bn2 = nn.BatchNorm2d(max_channels)
self.bn3 = nn.BatchNorm2d(max_channels)
self.classifier = nn.Linear(max_channels, num_classes)
def forward(self, x, arch_config=None):
if arch_config is None:
arch_config = {
'conv1_out': torch.randint(0, len(self.channel_options), (1,)).item(),
'conv2_out': torch.randint(0, len(self.channel_options), (1,)).item(),
}
c1 = self.channel_options[arch_config['conv1_out']]
c2 = self.channel_options[arch_config['conv2_out']]
x = F.relu(self.bn1(self.conv1(x)[:, :c1]))
x = F.relu(self.bn2(self.conv2(
F.pad(x, (0, 0, 0, 0, 0, self.max_channels - c1))
)[:, :c2]))
x = F.relu(self.bn3(self.conv3(
F.pad(x, (0, 0, 0, 0, 0, self.max_channels - c2))
)))
x = x.mean([2, 3])
return self.classifier(x)
9. Pipeline Automation
Auto-sklearn
pip install auto-sklearn
from autosklearn.metrics import roc_auc, mean_squared_error
def auto_sklearn_example(X_train, y_train, X_test, task='classification'):
"""Auto-sklearn: scikit-learn-compatible AutoML"""
if task == 'classification':
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=3600,
per_run_time_limit=360,
n_jobs=-1,
memory_limit=8192,
ensemble_size=50,
ensemble_nbest=50,
max_models_on_disc=50,
include={
'classifier': [
'random_forest', 'gradient_boosting',
'extra_trees', 'liblinear_svc'
]
},
metric=roc_auc,
resampling_strategy='cv',
resampling_strategy_arguments={'folds': 5},
seed=42,
)
else:
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=3600,
per_run_time_limit=360,
n_jobs=-1,
metric=mean_squared_error,
seed=42,
)
automl.fit(X_train, y_train)
print(automl.sprint_statistics())
print(automl.leaderboard())
predictions = automl.predict(X_test)
return automl, predictions
10. AutoML in the LLM Era
Leveraging LLMs for AutoML
Large Language Models (LLMs) are opening new possibilities for AutoML:
1. **Hyperparameter suggestion**: LLMs recommend starting configurations based on dataset characteristics
2. **Feature engineering**: LLMs use domain knowledge to suggest new feature ideas
3. **Code generation**: Automatically generate preprocessing and training code
4. **Error debugging**: Diagnose training failures and suggest solutions
LLM-guided hyperparameter optimization (conceptual code)
from openai import OpenAI
def llm_hyperparameter_suggestion(dataset_description, model_type, previous_results=None):
"""Use LLM to suggest hyperparameters"""
client = OpenAI()
prompt = f"""
Dataset characteristics:
{dataset_description}
Model type: {model_type}
Previous results:
{previous_results if previous_results else 'None (first attempt)'}
Based on this information, suggest optimal hyperparameters for {model_type} in JSON format.
"""
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system",
"content": "You are a machine learning expert. Help optimize hyperparameters."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"}
)
return response.choices[0].message.content
AutoML Agent (experimental)
class AutoMLAgent:
"""LLM-guided AutoML agent"""
def __init__(self, llm_client, X_train, y_train, X_val, y_val, max_iterations=10):
self.client = llm_client
self.X_train = X_train
self.y_train = y_train
self.X_val = X_val
self.y_val = y_val
self.max_iterations = max_iterations
self.history = []
self.best_score = 0
self.best_params = None
def get_next_config(self):
"""Ask the LLM for the next configuration to try"""
history_str = "\n".join([
f"Iteration {i+1}: params={h['params']}, score={h['score']:.4f}"
for i, h in enumerate(self.history[-5:])
])
prompt = f"""
LightGBM parameter attempts so far:
{history_str if history_str else 'None (first attempt)'}
Suggest the next parameter combination to try in JSON format.
Valid ranges: num_leaves(10-300), learning_rate(0.001-0.3),
n_estimators(100-2000), subsample(0.5-1.0), colsample_bytree(0.5-1.0)
"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are an HPO expert."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
def evaluate(self, params):
"""Evaluate a parameter configuration"""
from sklearn.metrics import roc_auc_score
model = lgb.LGBMClassifier(**params, random_state=42, verbose=-1)
model.fit(self.X_train, self.y_train)
preds = model.predict_proba(self.X_val)[:, 1]
return roc_auc_score(self.y_val, preds)
def run(self):
"""Run the AutoML agent loop"""
for i in range(self.max_iterations):
config = self.get_next_config()
score = self.evaluate(config)
self.history.append({'params': config, 'score': score})
if score > self.best_score:
self.best_score = score
self.best_params = config
print(f"Iteration {i+1}: New best score {score:.4f}")
print(f"\nBest score: {self.best_score:.4f}")
print(f"Best params: {self.best_params}")
return self.best_params
Conclusion
This guide covered the complete AutoML ecosystem:
1. **Hyperparameter Optimization**: From grid search to Bayesian optimization, building systematic intuition
2. **Optuna**: The most flexible Python-native HPO framework, with pruning and visualization
3. **Ray Tune**: Large-scale distributed HPO across multiple GPUs and nodes
4. **AutoGluon**: Amazon's powerful multimodal AutoML for tabular, image, and text data
5. **FLAML**: Microsoft's cost-efficient AutoML with minimal overhead
6. **H2O AutoML**: Enterprise-grade AutoML with interpretability tooling
7. **NAS**: Automated design of optimal neural architectures with DARTS and one-shot methods
8. **LLM + AutoML**: The next frontier of intelligent, language-guided automation
**Key Recommendations:**
- Under time constraints: use FLAML or AutoGluon with the `good_quality` preset
- Tuning a specific model: use Optuna
- Large-scale or distributed experiments: use Ray Tune
- Enterprise environments: leverage H2O AutoML for its interpretability tools
- LLM-based AutoML is still research-stage but is worth watching closely
AutoML is a tool, not magic. Domain knowledge, data quality, and a correct evaluation framework remain the most critical ingredients for success.
References
- [Optuna Documentation](https://optuna.org/)
- [AutoGluon Documentation](https://auto.gluon.ai/)
- [FLAML Documentation](https://microsoft.github.io/FLAML/)
- [H2O AutoML](https://h2o.ai/products/h2o-automl/)
- [Ray Tune Documentation](https://docs.ray.io/en/latest/tune/index.html)
- [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055)
- Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization.
- Feurer, M., et al. (2015). Efficient and Robust Automated Machine Learning (Auto-sklearn).
- He, X., et al. (2021). AutoML: A Survey of the State-of-the-Art.
현재 단락 (1/854)
AutoML (Automated Machine Learning) automates various stages of the machine learning pipeline. Tasks...