💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

1. AutoML Overview

What is AutoML?

AutoML (Automated Machine Learning) automates various stages of the machine learning pipeline. Tasks that data scientists previously performed manually — data preprocessing, feature engineering, model selection, hyperparameter optimization, and ensembling — are handled automatically by algorithms.

**What AutoML Automates:**

1. **Data Preprocessing Automation**

- Missing value imputation strategy selection

- Scaling and normalization method selection

- Outlier handling

2. **Feature Engineering Automation**

- Feature transformations (log, square, interactions)

- Categorical encoding method selection

- Feature selection and generation

3. **Model Selection (Algorithm Selection)**

- Searching over diverse algorithms

- Meta-learning (leveraging experience from prior tasks)

4. **Hyperparameter Optimization (HPO)**

- Grid/random search

- Bayesian optimization

- Evolutionary algorithms

5. **Ensemble Automation**

- Searching for the optimal ensemble configuration

- Automated stacking and blending

6. **Neural Architecture Search (NAS)**

- Automated design of optimal neural network architectures

AutoML Application Domains

**Industry Applications:**

- **Finance**: Credit risk models, automated fraud detection

- **Healthcare**: Rapid prototyping of diagnostic support systems

- **Retail**: Automated demand forecasting model refresh

- **Manufacturing**: Quality control model automation

**Major Open-Source AutoML Tools:**

| Tool | Developer | Strengths |

| ------------ | ------------------ | -------------------------------- |

| AutoGluon | Amazon | Multimodal, tabular, image, text |

| FLAML | Microsoft | Cost-efficient, fast |

| Optuna | Preferred Networks | HPO, visualization |

| H2O AutoML | H2O.ai | Enterprise, interpretable |

| Auto-sklearn | AutoML Group | scikit-learn compatible |

| Ray Tune | Anyscale | Distributed HPO |

| NNI | Microsoft | NAS, HPO |

Pros and Cons of AutoML

**Pros:**

- Enables non-experts to build high-quality models

- Saves time by automating repetitive experiments

- Discovers hyperparameter combinations humans might miss

- Provides reproducible pipelines

**Cons:**

- Computational costs can be very high

- Limited ability to incorporate domain knowledge

- Black-box nature (internal workings difficult to understand)

- Custom solutions are more effective for specialized problems

- Risk of data leakage

2. Hyperparameter Optimization (HPO)

Grid Search

The simplest HPO method — exhaustively tries every combination in the search space.

from sklearn.model_selection import GridSearchCV

def grid_search_example(X_train, y_train):

"""Grid Search: exhaustive (only practical for small search spaces)"""

param_grid = {

'max_depth': [3, 5, 7],

'learning_rate': [0.01, 0.1, 0.3],

'n_estimators': [100, 300, 500],

'subsample': [0.7, 0.9],

}

Total combinations: 3 * 3 * 3 * 2 = 54 * CV folds

model = xgb.XGBClassifier(random_state=42, n_jobs=-1)

grid_search = GridSearchCV(

model, param_grid,

cv=5, scoring='roc_auc', n_jobs=-1, verbose=1, refit=True

)

grid_search.fit(X_train, y_train)

print(f"Best params: {grid_search.best_params_}")

print(f"Best CV score: {grid_search.best_score_:.4f}")

results = pd.DataFrame(grid_search.cv_results_)

print(results.sort_values('mean_test_score', ascending=False)[

['params', 'mean_test_score', 'std_test_score']

].head(10))

return grid_search.best_estimator_

Random Search

Proposed by Bergstra & Bengio (2012) — samples randomly from parameter distributions, which is often far more efficient than grid search.

from sklearn.model_selection import RandomizedSearchCV

from scipy.stats import uniform, randint, loguniform

def random_search_example(X_train, y_train, n_iter=100):

"""Random Search: sample from continuous distributions"""

param_distributions = {

'max_depth': randint(3, 10),

'learning_rate': loguniform(1e-3, 0.5), # log-uniform over [0.001, 0.5]

'n_estimators': randint(100, 1000),

'subsample': uniform(0.6, 0.4), # uniform over [0.6, 1.0]

'colsample_bytree': uniform(0.6, 0.4),

'reg_alpha': loguniform(1e-4, 10),

'reg_lambda': loguniform(1e-4, 10),

'min_child_weight': randint(1, 10),

'gamma': uniform(0, 0.5),

}

model = xgb.XGBClassifier(random_state=42, n_jobs=-1)

random_search = RandomizedSearchCV(

model, param_distributions,

n_iter=n_iter, cv=5, scoring='roc_auc',

n_jobs=-1, verbose=1, random_state=42, refit=True

)

random_search.fit(X_train, y_train)

print(f"Best params: {random_search.best_params_}")

print(f"Best CV score: {random_search.best_score_:.4f}")

return random_search.best_estimator_

Bayesian Optimization

Bayesian optimization uses the results of previous evaluations to intelligently select the next point to evaluate.

**Core Components:**

1. **Surrogate Model**: Probabilistic approximation of the objective function (typically Gaussian Process)

2. **Acquisition Function**: Determines the next evaluation point

- EI (Expected Improvement): Expected improvement over the current best

- UCB (Upper Confidence Bound): Balance between exploration and exploitation

- PI (Probability of Improvement): Probability of improving over the current best

**TPE (Tree-structured Parzen Estimator):**

- Default algorithm used by Optuna

- Models two density functions l(x) and g(x) instead of p(x|y)

- l(x) models the distribution of parameters that led to good results (top gamma%)

- g(x) models the rest

- The next point maximizes the l(x)/g(x) ratio

3. Optuna

Core Concepts

Optuna, developed by Preferred Networks, is a Python-native HPO framework known for its simplicity and flexibility.

**Key concepts:**

- **Study**: The entire optimization experiment (a collection of Trials)

- **Trial**: A single hyperparameter configuration attempt

- **Objective Function**: The function to optimize (minimize or maximize)

- **Sampler**: The parameter suggestion algorithm (TPE, CMA-ES, Random, etc.)

- **Pruner**: Early termination of unpromising Trials

pip install optuna optuna-dashboard

from optuna.samplers import TPESampler, CmaEsSampler, RandomSampler

from optuna.pruners import MedianPruner, HyperbandPruner

from sklearn.model_selection import StratifiedKFold

from sklearn.metrics import roc_auc_score

optuna.logging.set_verbosity(optuna.logging.WARNING)

def objective_lgbm(trial, X, y):

"""Optuna objective function for LightGBM optimization"""

params = {

'objective': 'binary',

'metric': 'auc',

'verbosity': -1,

'boosting_type': trial.suggest_categorical('boosting_type', ['gbdt', 'dart']),

'num_leaves': trial.suggest_int('num_leaves', 20, 300),

'max_depth': trial.suggest_int('max_depth', 3, 12),

'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),

'learning_rate': trial.suggest_float('learning_rate', 1e-4, 0.3, log=True),

'n_estimators': trial.suggest_int('n_estimators', 100, 2000),

'subsample': trial.suggest_float('subsample', 0.5, 1.0),

'subsample_freq': trial.suggest_int('subsample_freq', 1, 7),

'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),

'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),

'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),

'min_split_gain': trial.suggest_float('min_split_gain', 0, 1),

'n_jobs': -1,

}

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

cv_scores = []

for fold, (train_idx, val_idx) in enumerate(skf.split(X, y)):

X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]

y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

train_data = lgb.Dataset(X_train, y_train)

val_data = lgb.Dataset(X_val, y_val, reference=train_data)

model = lgb.train(

params, train_data,

num_boost_round=params['n_estimators'],

valid_sets=[val_data],

callbacks=[

lgb.early_stopping(stopping_rounds=50, verbose=False),

lgb.log_evaluation(-1),

)

preds = model.predict(X_val)

fold_score = roc_auc_score(y_val, preds)

cv_scores.append(fold_score)

Report intermediate results for pruning

trial.report(fold_score, fold)

if trial.should_prune():

raise optuna.exceptions.TrialPruned()

return np.mean(cv_scores)

def run_optuna_study(X, y, n_trials=100, n_jobs=1):

"""Run an Optuna study with TPE sampler and median pruning"""

sampler = TPESampler(

n_startup_trials=20,

n_ei_candidates=24,

multivariate=True,

seed=42

)

pruner = MedianPruner(

n_startup_trials=5,

n_warmup_steps=10,

interval_steps=1

)

study = optuna.create_study(

direction='maximize',

sampler=sampler,

pruner=pruner,

study_name='lgbm_optimization',

storage='sqlite:///optuna.db', # persist results

load_if_exists=True, # resume existing study

)

study.optimize(

lambda trial: objective_lgbm(trial, X, y),

n_trials=n_trials,

n_jobs=n_jobs,

show_progress_bar=True,

)

print(f"\nBest params:")

for key, value in study.best_params.items():

print(f" {key}: {value}")

print(f"Best AUC: {study.best_value:.4f}")

print(f"Completed trials: {len(study.trials)}")

pruned = [t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED]

print(f"Pruned trials: {len(pruned)}")

return study

def visualize_optuna_study(study):

"""Visualize Optuna optimization results"""

vis.plot_optimization_history(study).show()

vis.plot_param_importances(study).show()

vis.plot_parallel_coordinate(study).show()

vis.plot_slice(study).show()

vis.plot_contour(study, params=['learning_rate', 'num_leaves']).show()

Complete PyTorch + Optuna Example

from torch.utils.data import DataLoader, TensorDataset

def create_model(trial, input_dim):

"""Dynamically build a neural network from Optuna trial parameters"""

n_layers = trial.suggest_int('n_layers', 1, 4)

dropout = trial.suggest_float('dropout', 0.1, 0.5)

activation_name = trial.suggest_categorical('activation', ['relu', 'tanh', 'elu'])

activation_map = {'relu': nn.ReLU(), 'tanh': nn.Tanh(), 'elu': nn.ELU()}

layers = []

in_features = input_dim

for i in range(n_layers):

out_features = trial.suggest_int(f'n_units_l{i}', 32, 512)

layers.extend([

nn.Linear(in_features, out_features),

nn.BatchNorm1d(out_features),

activation_map[activation_name],

nn.Dropout(dropout),

])

in_features = out_features

layers.extend([nn.Linear(in_features, 1), nn.Sigmoid()])

return nn.Sequential(*layers)

def objective_pytorch(trial, X_train_t, y_train_t, X_val_t, y_val_t, input_dim):

"""Optuna objective function for PyTorch neural network"""

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = create_model(trial, input_dim).to(device)

lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True)

optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'RMSprop', 'SGD'])

batch_size = trial.suggest_categorical('batch_size', [32, 64, 128, 256])

weight_decay = trial.suggest_float('weight_decay', 1e-8, 1e-2, log=True)

if optimizer_name == 'Adam':

optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

elif optimizer_name == 'RMSprop':

optimizer = optim.RMSprop(model.parameters(), lr=lr, weight_decay=weight_decay)

else:

optimizer = optim.SGD(model.parameters(), lr=lr, weight_decay=weight_decay,

momentum=0.9)

scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)

criterion = nn.BCELoss()

train_dataset = TensorDataset(

X_train_t.to(device),

y_train_t.to(device).float().unsqueeze(1)

)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

best_val_loss = float('inf')

patience_counter = 0

patience = 10

for epoch in range(100):

model.train()

for X_batch, y_batch in train_loader:

optimizer.zero_grad()

loss = criterion(model(X_batch), y_batch)

loss.backward()

optimizer.step()

scheduler.step()

model.eval()

with torch.no_grad():

val_loss = criterion(

model(X_val_t.to(device)),

y_val_t.to(device).float().unsqueeze(1)

).item()

trial.report(val_loss, epoch)

if trial.should_prune():

raise optuna.exceptions.TrialPruned()

if val_loss < best_val_loss:

best_val_loss = val_loss

patience_counter = 0

else:

patience_counter += 1

if patience_counter >= patience:

break

return best_val_loss

def run_pytorch_optuna(X_train, y_train, X_val, y_val, n_trials=50):

X_train_t = torch.FloatTensor(X_train.values if hasattr(X_train, 'values') else X_train)

y_train_t = torch.FloatTensor(y_train.values if hasattr(y_train, 'values') else y_train)

X_val_t = torch.FloatTensor(X_val.values if hasattr(X_val, 'values') else X_val)

y_val_t = torch.FloatTensor(y_val.values if hasattr(y_val, 'values') else y_val)

input_dim = X_train_t.shape[1]

study = optuna.create_study(

direction='minimize',

pruner=optuna.pruners.HyperbandPruner(

min_resource=5, max_resource=100, reduction_factor=3

sampler=TPESampler(seed=42)

)

study.optimize(

lambda trial: objective_pytorch(

trial, X_train_t, y_train_t, X_val_t, y_val_t, input_dim

n_trials=n_trials,

show_progress_bar=True,

)

print(f"Best val loss: {study.best_value:.4f}")

print(f"Best params: {study.best_params}")

return study

CMA-ES Sampler

CMA-ES is more efficient for continuous hyperparameter spaces

study_cmaes = optuna.create_study(

direction='maximize',

sampler=CmaEsSampler(

n_startup_trials=10,

restart_strategy='ipop', # restart strategy for escaping local optima

seed=42

)

4. Ray Tune

Distributed HPO with Ray Tune

Ray Tune, developed by Anyscale, handles parallel training across multiple GPUs and nodes automatically.

pip install ray[tune] ray[air]

from ray import tune

from ray.tune import CLIReporter

from ray.tune.schedulers import ASHAScheduler, PopulationBasedTraining

from ray.tune.search.optuna import OptunaSearch

ray.init(ignore_reinit_error=True)

def train_with_tune(config, data=None):

"""Training function called by Ray Tune"""

X_train, y_train, X_val, y_val = data

model = nn.Sequential(

nn.Linear(X_train.shape[1], config['hidden_size']),

nn.ReLU(),

nn.Dropout(config['dropout']),

nn.Linear(config['hidden_size'], config['hidden_size'] // 2),

nn.ReLU(),

nn.Linear(config['hidden_size'] // 2, 1),

nn.Sigmoid()

)

optimizer = torch.optim.Adam(

model.parameters(), lr=config['lr'], weight_decay=config['weight_decay']

)

criterion = nn.BCELoss()

X_train_t = torch.FloatTensor(X_train)

y_train_t = torch.FloatTensor(y_train).unsqueeze(1)

X_val_t = torch.FloatTensor(X_val)

y_val_t = torch.FloatTensor(y_val).unsqueeze(1)

for epoch in range(config['max_epochs']):

model.train()

optimizer.zero_grad()

loss = criterion(model(X_train_t), y_train_t)

loss.backward()

optimizer.step()

if epoch % 5 == 0:

model.eval()

with torch.no_grad():

val_loss = criterion(model(X_val_t), y_val_t).item()

tune.report(val_loss=val_loss, training_iteration=epoch)

def run_ray_tune(X_train, y_train, X_val, y_val, num_samples=50):

"""Run distributed HPO with Ray Tune"""

config = {

'hidden_size': tune.choice([64, 128, 256, 512]),

'dropout': tune.uniform(0.1, 0.5),

'lr': tune.loguniform(1e-5, 1e-1),

'weight_decay': tune.loguniform(1e-8, 1e-3),

'max_epochs': tune.choice([50, 100, 200]),

}

ASHA: Asynchronous Successive Halving Algorithm

scheduler = ASHAScheduler(

metric='val_loss',

mode='min',

max_t=200, # Max epochs

grace_period=10, # Min epochs before pruning

reduction_factor=3,

)

search_alg = OptunaSearch(metric='val_loss', mode='min')

result = tune.run(

tune.with_parameters(

train_with_tune,

data=(X_train, y_train, X_val, y_val)

config=config,

num_samples=num_samples,

scheduler=scheduler,

search_alg=search_alg,

progress_reporter=CLIReporter(

metric_columns=['val_loss', 'training_iteration'],

max_progress_rows=10

verbose=1,

resources_per_trial={'cpu': 2, 'gpu': 0},

)

best_trial = result.get_best_trial('val_loss', 'min', 'last')

print(f"Best val loss: {best_trial.last_result['val_loss']:.4f}")

print(f"Best config: {best_trial.config}")

return result

def run_pbt(X_train, y_train, X_val, y_val):

"""Population Based Training: dynamically mutate hyperparameters during training"""

pbt_scheduler = PopulationBasedTraining(

time_attr='training_iteration',

metric='val_loss',

mode='min',

perturbation_interval=20,

hyperparam_mutations={

'lr': tune.loguniform(1e-5, 1e-1),

'dropout': tune.uniform(0.1, 0.5),

quantile_fraction=0.25, # Replace bottom 25% with top 25%

)

result = tune.run(

tune.with_parameters(

train_with_tune,

data=(X_train, y_train, X_val, y_val)

config={

'hidden_size': 256,

'dropout': tune.uniform(0.1, 0.5),

'lr': tune.loguniform(1e-4, 1e-1),

'weight_decay': 1e-5,

'max_epochs': 200,

num_samples=8,

scheduler=pbt_scheduler,

verbose=1,

)

return result

5. AutoGluon

AutoGluon Overview

AutoGluon, developed by Amazon, achieves Kaggle-level performance with minimal code — sometimes just 3 lines.

pip install autogluon

Tabular Data (TabularPredictor)

from autogluon.tabular import TabularPredictor

def autogluon_tabular_example(train_df, test_df, target_col, eval_metric='roc_auc'):

"""AutoGluon tabular training"""

predictor = TabularPredictor(

label=target_col,

eval_metric=eval_metric,

path='autogluon_models/',

problem_type='binary', # 'binary', 'multiclass', 'regression', 'softclass'

)

predictor.fit(

train_data=train_df,

time_limit=3600,

presets='best_quality', # 'best_quality', 'good_quality', 'medium_quality',

'optimize_for_deployment'

excluded_model_types=['KNN'],

verbosity=2,

)

leaderboard = predictor.leaderboard(test_df, silent=True)

print(leaderboard[['model', 'score_test', 'score_val', 'pred_time_test']].head(10))

predictions = predictor.predict(test_df)

pred_proba = predictor.predict_proba(test_df)

feature_importance = predictor.feature_importance(test_df)

print(feature_importance.head(20))

return predictor, predictions, pred_proba

def autogluon_advanced(train_df, test_df, target_col):

"""AutoGluon with custom hyperparameters"""

hyperparameters = {

'GBM': [

{'num_boost_round': 300, 'ag_args': {'name_suffix': 'fast'}},

{'num_boost_round': 1000, 'learning_rate': 0.03,

'ag_args': {'name_suffix': 'slow', 'priority': 0}},

'XGB': [{'n_estimators': 300, 'max_depth': 6}],

'CAT': [{'iterations': 500, 'depth': 6}],

'NN_TORCH': [{'num_epochs': 50, 'learning_rate': 1e-3, 'dropout_prob': 0.1}],

'RF': [{'n_estimators': 300}],

}

predictor = TabularPredictor(

label=target_col, eval_metric='roc_auc', path='autogluon_advanced/'

)

predictor.fit(

train_data=train_df,

hyperparameters=hyperparameters,

time_limit=7200,

num_stack_levels=1, # Number of stacking levels

num_bag_folds=5, # Number of CV folds for bagging

num_bag_sets=1, # Number of bagging sets

verbosity=3,

)

return predictor

Multimodal Learning

from autogluon.multimodal import MultiModalPredictor

def autogluon_image_classification(train_df, test_df, label_col):

"""AutoGluon image classification"""

predictor = MultiModalPredictor(label=label_col)

predictor.fit(

train_data=train_df,

time_limit=3600,

hyperparameters={

'model.timm_image.checkpoint_name': 'efficientnet_b4',

'optimization.learning_rate': 1e-4,

'optimization.max_epochs': 20,

}

)

return predictor

def autogluon_multimodal(train_df, test_df, target_col):

"""AutoGluon multimodal: text + tabular features together"""

predictor = MultiModalPredictor(label=target_col, problem_type='binary')

predictor.fit(

train_data=train_df,

time_limit=3600,

hyperparameters={

'model.hf_text.checkpoint_name': 'bert-base-uncased',

}

)

return predictor

6. FLAML

Microsoft FLAML

FLAML (Fast and Lightweight AutoML), developed by Microsoft Research, specializes in cost-efficient automation.

pip install flaml

from flaml import AutoML

def flaml_basic_example(X_train, y_train, X_test, task='classification'):

"""FLAML basic usage"""

automl = AutoML()

automl_settings = {

'time_budget': 300,

'metric': 'roc_auc',

'task': task, # 'classification', 'regression', 'ranking'

'estimator_list': [

'lgbm', 'xgboost', 'catboost',

'rf', 'extra_tree', 'lrl1', 'lrl2', 'kneighbor'

'log_file_name': 'flaml_log.log',

'seed': 42,

'n_jobs': -1,

'verbose': 1,

'retrain_full': True, # Retrain final model on all data

'max_iter': 100,

'ensemble': True,

'eval_method': 'cv',

'n_splits': 5,

}

automl.fit(X_train, y_train, **automl_settings)

print(f"Best estimator: {automl.best_estimator}")

print(f"Best loss: {automl.best_loss:.4f}")

print(f"Best config: {automl.best_config}")

print(f"Time to find best model: {automl.time_to_find_best_model:.1f}s")

predictions = automl.predict(X_test)

pred_proba = automl.predict_proba(X_test)

return automl, predictions, pred_proba

def flaml_sklearn_pipeline(X_train, y_train, X_test):

"""Integrate FLAML into a scikit-learn Pipeline"""

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

automl = AutoML()

pipeline = Pipeline([

('scaler', StandardScaler()),

('automl', automl),

])

pipeline.fit(

X_train, y_train,

automl__time_budget=120,

automl__metric='roc_auc',

automl__task='classification',

)

return pipeline

def flaml_custom_objective(X_train, y_train):

"""FLAML with a custom evaluation metric"""

def custom_metric(

X_val, y_val, estimator, labels, X_train, y_train,

weight_val=None, weight_train=None, *args

"""Optimize F-beta score"""

from sklearn.metrics import fbeta_score

y_pred = estimator.predict(X_val)

score = fbeta_score(y_val, y_pred, beta=2, average='weighted')

return -score, {'f2_score': score} # (loss, metrics_dict)

automl = AutoML()

automl.fit(

X_train, y_train,

metric=custom_metric,

task='classification',

time_budget=120,

)

return automl

7. H2O AutoML

H2O Cluster

H2O AutoML is an enterprise-grade AutoML platform with extensive interpretability tools.

pip install h2o

from h2o.automl import H2OAutoML

def h2o_automl_example(train_df, test_df, target_col, max_models=20):

"""H2O AutoML end-to-end example"""

h2o.init(nthreads=-1, max_mem_size='8G', port=54321)

train_h2o = h2o.H2OFrame(train_df)

test_h2o = h2o.H2OFrame(test_df)

Mark target as factor for classification

train_h2o[target_col] = train_h2o[target_col].asfactor()

feature_cols = [col for col in train_df.columns if col != target_col]

aml = H2OAutoML(

max_models=max_models,

max_runtime_secs=3600,

seed=42,

sort_metric='AUC',

balance_classes=False,

include_algos=[

'GBM', 'GLM', 'DRF', 'DeepLearning',

'StackedEnsemble', 'XGBoost'

keep_cross_validation_predictions=True,

keep_cross_validation_models=True,

nfolds=5,

verbosity='info',

)

aml.fit(

x=feature_cols, y=target_col,

training_frame=train_h2o,

leaderboard_frame=test_h2o,

)

lb = aml.leaderboard

print("H2O AutoML Leaderboard:")

print(lb.head(20))

best_model = aml.leader

print(f"\nBest model: {best_model.model_id}")

predictions = best_model.predict(test_h2o).as_data_frame()

Save model

model_path = h2o.save_model(model=best_model, path='h2o_models/', force=True)

print(f"Model saved to: {model_path}")

return aml, best_model, predictions

def cleanup_h2o():

h2o.cluster().shutdown()

8. Neural Architecture Search (NAS)

NAS Overview

Neural Architecture Search (NAS) automatically finds optimal neural network architectures.

**Three components of NAS:**

1. **Search Space**: The set of possible architectures

2. **Search Strategy**: How to explore the space (random, evolutionary, RL, gradient-based)

3. **Performance Estimation**: How to evaluate candidate architectures

DARTS (Differentiable Architecture Search)

DARTS (Liu et al., 2019) makes architecture search differentiable via continuous relaxation of discrete choices.

class MixedOperation(nn.Module):

"""DARTS mixed operation: weighted sum of candidate ops"""

def __init__(self, operations):

super().__init__()

self.ops = nn.ModuleList(operations)

self.alphas = nn.Parameter(torch.randn(len(operations)))

def forward(self, x):

weights = F.softmax(self.alphas, dim=0)

return sum(w * op(x) for w, op in zip(weights, self.ops))

class DARTSCell(nn.Module):

"""A single DARTS cell"""

def __init__(self, in_channels, out_channels):

super().__init__()

operations = [

nn.Conv2d(in_channels, out_channels, 3, padding=1),

nn.Conv2d(in_channels, out_channels, 5, padding=2),

nn.MaxPool2d(3, stride=1, padding=1),

nn.AvgPool2d(3, stride=1, padding=1),

nn.Identity() if in_channels == out_channels

else nn.Conv2d(in_channels, out_channels, 1),

]

self.mixed_op = MixedOperation(operations)

self.bn = nn.BatchNorm2d(out_channels)

def forward(self, x):

return F.relu(self.bn(self.mixed_op(x)))

class SimpleDARTS(nn.Module):

"""Simplified DARTS network"""

def __init__(self, num_classes=10, num_cells=6):

super().__init__()

self.stem = nn.Conv2d(3, 64, 3, padding=1)

self.cells = nn.ModuleList([DARTSCell(64, 64) for _ in range(num_cells)])

self.classifier = nn.Linear(64, num_classes)

def forward(self, x):

x = self.stem(x)

for cell in self.cells:

x = cell(x)

x = x.mean([2, 3]) # Global average pooling

return self.classifier(x)

def arch_parameters(self):

return [p for n, p in self.named_parameters() if 'alphas' in n]

def model_parameters(self):

return [p for n, p in self.named_parameters() if 'alphas' not in n]

def train_darts(model, train_loader, val_loader, epochs=50):

"""Bilevel optimization for DARTS"""

w_optimizer = torch.optim.SGD(

model.model_parameters(), lr=0.025, momentum=0.9, weight_decay=3e-4

)

a_optimizer = torch.optim.Adam(

model.arch_parameters(), lr=3e-4, betas=(0.5, 0.999), weight_decay=1e-3

)

w_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(w_optimizer, T_max=epochs)

for epoch in range(epochs):

model.train()

train_iter = iter(train_loader)

val_iter = iter(val_loader)

for step in range(min(len(train_loader), len(val_loader))):

Step 1: Update architecture parameters using validation data

try:

X_val, y_val = next(val_iter)

except StopIteration:

val_iter = iter(val_loader)

X_val, y_val = next(val_iter)

a_optimizer.zero_grad()

val_loss = F.cross_entropy(model(X_val), y_val)

val_loss.backward()

a_optimizer.step()

Step 2: Update weight parameters using training data

X_train, y_train = next(train_iter)

w_optimizer.zero_grad()

train_loss = F.cross_entropy(model(X_train), y_train)

train_loss.backward()

nn.utils.clip_grad_norm_(model.model_parameters(), 5.0)

w_optimizer.step()

w_scheduler.step()

if epoch % 10 == 0:

print(f"Epoch {epoch}: Train Loss = {train_loss.item():.4f}")

Extract discovered architecture

for i, cell in enumerate(model.cells):

weights = F.softmax(cell.mixed_op.alphas, dim=0).detach()

best_op = weights.argmax().item()

print(f"Cell {i}: Best op index = {best_op}, weights = {weights.numpy()}")

return model

One-Shot NAS

class SuperNetwork(nn.Module):

"""One-Shot NAS: sample sub-networks from a single super-network"""

def __init__(self, num_classes=10, max_channels=256):

super().__init__()

self.max_channels = max_channels

self.channel_options = [64, 128, 256]

self.conv1 = nn.Conv2d(3, max_channels, 3, padding=1)

self.conv2 = nn.Conv2d(max_channels, max_channels, 3, padding=1)

self.conv3 = nn.Conv2d(max_channels, max_channels, 3, padding=1)

self.bn1 = nn.BatchNorm2d(max_channels)

self.bn2 = nn.BatchNorm2d(max_channels)

self.bn3 = nn.BatchNorm2d(max_channels)

self.classifier = nn.Linear(max_channels, num_classes)

def forward(self, x, arch_config=None):

if arch_config is None:

arch_config = {

'conv1_out': torch.randint(0, len(self.channel_options), (1,)).item(),

'conv2_out': torch.randint(0, len(self.channel_options), (1,)).item(),

}

c1 = self.channel_options[arch_config['conv1_out']]

c2 = self.channel_options[arch_config['conv2_out']]

x = F.relu(self.bn1(self.conv1(x)[:, :c1]))

x = F.relu(self.bn2(self.conv2(

F.pad(x, (0, 0, 0, 0, 0, self.max_channels - c1))

)[:, :c2]))

x = F.relu(self.bn3(self.conv3(

F.pad(x, (0, 0, 0, 0, 0, self.max_channels - c2))

)))

x = x.mean([2, 3])

return self.classifier(x)

9. Pipeline Automation

Auto-sklearn

pip install auto-sklearn

from autosklearn.metrics import roc_auc, mean_squared_error

def auto_sklearn_example(X_train, y_train, X_test, task='classification'):

"""Auto-sklearn: scikit-learn-compatible AutoML"""

if task == 'classification':

automl = autosklearn.classification.AutoSklearnClassifier(

time_left_for_this_task=3600,

per_run_time_limit=360,

n_jobs=-1,

memory_limit=8192,

ensemble_size=50,

ensemble_nbest=50,

max_models_on_disc=50,

include={

'classifier': [

'random_forest', 'gradient_boosting',

'extra_trees', 'liblinear_svc'

]

metric=roc_auc,

resampling_strategy='cv',

resampling_strategy_arguments={'folds': 5},

seed=42,

)

else:

automl = autosklearn.regression.AutoSklearnRegressor(

time_left_for_this_task=3600,

per_run_time_limit=360,

n_jobs=-1,

metric=mean_squared_error,

seed=42,

)

automl.fit(X_train, y_train)

print(automl.sprint_statistics())

print(automl.leaderboard())

predictions = automl.predict(X_test)

return automl, predictions

10. AutoML in the LLM Era

Leveraging LLMs for AutoML

Large Language Models (LLMs) are opening new possibilities for AutoML:

1. **Hyperparameter suggestion**: LLMs recommend starting configurations based on dataset characteristics

2. **Feature engineering**: LLMs use domain knowledge to suggest new feature ideas

3. **Code generation**: Automatically generate preprocessing and training code

4. **Error debugging**: Diagnose training failures and suggest solutions

LLM-guided hyperparameter optimization (conceptual code)

from openai import OpenAI

def llm_hyperparameter_suggestion(dataset_description, model_type, previous_results=None):

"""Use LLM to suggest hyperparameters"""

client = OpenAI()

prompt = f"""

Dataset characteristics:

{dataset_description}

Model type: {model_type}

Previous results:

{previous_results if previous_results else 'None (first attempt)'}

Based on this information, suggest optimal hyperparameters for {model_type} in JSON format.

"""

response = client.chat.completions.create(

model="gpt-4",

messages=[

{"role": "system",

"content": "You are a machine learning expert. Help optimize hyperparameters."},

{"role": "user", "content": prompt}

response_format={"type": "json_object"}

)

return response.choices[0].message.content

AutoML Agent (experimental)

class AutoMLAgent:

"""LLM-guided AutoML agent"""

def __init__(self, llm_client, X_train, y_train, X_val, y_val, max_iterations=10):

self.client = llm_client

self.X_train = X_train

self.y_train = y_train

self.X_val = X_val

self.y_val = y_val

self.max_iterations = max_iterations

self.history = []

self.best_score = 0

self.best_params = None

def get_next_config(self):

"""Ask the LLM for the next configuration to try"""

history_str = "\n".join([

f"Iteration {i+1}: params={h['params']}, score={h['score']:.4f}"

for i, h in enumerate(self.history[-5:])

])

prompt = f"""

LightGBM parameter attempts so far:

{history_str if history_str else 'None (first attempt)'}

Suggest the next parameter combination to try in JSON format.

Valid ranges: num_leaves(10-300), learning_rate(0.001-0.3),

n_estimators(100-2000), subsample(0.5-1.0), colsample_bytree(0.5-1.0)

"""

response = self.client.chat.completions.create(

model="gpt-4",

messages=[

{"role": "system", "content": "You are an HPO expert."},

{"role": "user", "content": prompt}

response_format={"type": "json_object"}

)

return json.loads(response.choices[0].message.content)

def evaluate(self, params):

"""Evaluate a parameter configuration"""

from sklearn.metrics import roc_auc_score

model = lgb.LGBMClassifier(**params, random_state=42, verbose=-1)

model.fit(self.X_train, self.y_train)

preds = model.predict_proba(self.X_val)[:, 1]

return roc_auc_score(self.y_val, preds)

def run(self):

"""Run the AutoML agent loop"""

for i in range(self.max_iterations):

config = self.get_next_config()

score = self.evaluate(config)

self.history.append({'params': config, 'score': score})

if score > self.best_score:

self.best_score = score

self.best_params = config

print(f"Iteration {i+1}: New best score {score:.4f}")

print(f"\nBest score: {self.best_score:.4f}")

print(f"Best params: {self.best_params}")

return self.best_params

Conclusion

This guide covered the complete AutoML ecosystem:

1. **Hyperparameter Optimization**: From grid search to Bayesian optimization, building systematic intuition

2. **Optuna**: The most flexible Python-native HPO framework, with pruning and visualization

3. **Ray Tune**: Large-scale distributed HPO across multiple GPUs and nodes

4. **AutoGluon**: Amazon's powerful multimodal AutoML for tabular, image, and text data

5. **FLAML**: Microsoft's cost-efficient AutoML with minimal overhead

6. **H2O AutoML**: Enterprise-grade AutoML with interpretability tooling

7. **NAS**: Automated design of optimal neural architectures with DARTS and one-shot methods

8. **LLM + AutoML**: The next frontier of intelligent, language-guided automation

**Key Recommendations:**

- Under time constraints: use FLAML or AutoGluon with the `good_quality` preset

- Tuning a specific model: use Optuna

- Large-scale or distributed experiments: use Ray Tune

- Enterprise environments: leverage H2O AutoML for its interpretability tools

- LLM-based AutoML is still research-stage but is worth watching closely

AutoML is a tool, not magic. Domain knowledge, data quality, and a correct evaluation framework remain the most critical ingredients for success.

References

- [Optuna Documentation](https://optuna.org/)

- [AutoGluon Documentation](https://auto.gluon.ai/)

- [FLAML Documentation](https://microsoft.github.io/FLAML/)

- [H2O AutoML](https://h2o.ai/products/h2o-automl/)

- [Ray Tune Documentation](https://docs.ray.io/en/latest/tune/index.html)

- [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055)

- Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization.

- Feurer, M., et al. (2015). Efficient and Robust Automated Machine Learning (Auto-sklearn).

- He, X., et al. (2021). AutoML: A Survey of the State-of-the-Art.