💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

1. クラウドAIプラットフォームの概要

AIエンジニアにとって、クラウドコンピューティングはもはや選択肢ではなく必須です。数百基のGPUを必要に応じて即座にプロビジョニングし、モデルの学習からサービング（推論配信）まで、マネージドサービスで管理できる環境が整っています。

IaaS、PaaS、SaaSの概念

クラウドサービスは3つのレイヤーに分類されます。

- **IaaS（Infrastructure as a Service）**: 仮想マシン、ストレージ、ネットワークを提供します。EC2 GPUインスタンスが代表例です。最大限の制御が可能ですが、インフラ管理の負担が大きいです。

- **PaaS（Platform as a Service）**: ランタイムとミドルウェアまで管理してくれます。AWS SageMaker、GCP Vertex AI、Azure MLがここに該当します。モデルコードに集中できます。

- **SaaS（Software as a Service）**: 完成したAI機能をAPIとして提供します。AWS Bedrock、GCP Gemini API、Azure OpenAI Serviceが代表的です。

主要クラウドAIサービスの比較

| 機能 | AWS | GCP | Azure |

| ---------------------- | ------------------------ | -------------------------- | ------------------------------- |

GPUインスタンスタイプの比較

**AWS GPUインスタンス**:

- `p3.2xlarge`: V100×1、61 GiB RAM — 小規模学習

- `p4d.24xlarge`: A100×8、320 GiB RAM — 大規模分散学習

- `p5.48xlarge`: H100×8、2 TiB RAM — 最新LLM学習

**GCP GPUインスタンス**:

- `n1-standard-8` + V100: コスト効率の高い学習

- `a2-highgpu-8g`: A100×8 — Vertex AI標準学習

- `a3-highgpu-8g`: H100×8 — 最新大規模モデル

**Azure GPUインスタンス**:

- `NC6s_v3`: V100×1 — 開発・テスト

- `ND96asr_v4`: A100×8 — 大規模学習

- `ND96amsr_A100_v4`: A100 80GB×8 — 最大性能

コスト最適化戦略

クラウドAIのコストの70%以上がコンピュートから発生します。主な節約戦略は以下の通りです。

- **Spot/Preemptibleインスタンス**: On-Demandと比較して最大90%の節約。インタラプション対策としてチェックポイントが必須。

- **Reserved Instances / Committed Use**: 1〜3年の契約で40〜60%の節約。長期プロジェクトに適しています。

- **Auto Scaling**: 推論トラフィックに応じてインスタンス数を自動調整。

- **Savings Plans（AWS）**: コンピュート使用量の契約により柔軟なインスタンスタイプの割引。

2. AWS AI/MLサービス

SageMakerの主要機能

Amazon SageMakerはAWSの統合MLプラットフォームです。データ準備からモデルのデプロイ、監視まで、MLライフサイクル全体を一つのサービスで処理します。

from sagemaker.pytorch import PyTorch

from sagemaker import get_execution_role

role = get_execution_role()

sess = sagemaker.Session()

SageMaker Training Job

estimator = PyTorch(

entry_point='train.py',

source_dir='./src',

role=role,

instance_type='ml.p4d.24xlarge',

instance_count=4,

framework_version='2.1.0',

py_version='py310',

hyperparameters={

'epochs': 10,

'batch-size': 32,

'learning-rate': 0.001

distribution={

'torch_distributed': {'enabled': True}

}

)

estimator.fit({'train': 's3://bucket/train', 'val': 's3://bucket/val'})

`distribution`パラメーターを通じて、PyTorch DDPベースの分散学習を簡単に設定できます。`instance_count=4`と`torch_distributed`オプションを一緒に指定すると、4ノードにわたるデータ並列学習が自動構成されます。

SageMakerモデルのデプロイ

from sagemaker.pytorch import PyTorchModel

model = PyTorchModel(

model_data='s3://bucket/model.tar.gz',

role=role,

framework_version='2.1.0',

py_version='py310',

entry_point='inference.py'

)

predictor = model.deploy(

initial_instance_count=2,

instance_type='ml.g4dn.xlarge',

endpoint_name='my-pytorch-endpoint'

)

予測呼び出し

result = predictor.predict({'inputs': 'Hello, cloud AI!'})

AWS Bedrock（LLM API）

AWS BedrockはAnthropic Claude、Meta Llama、Amazon Titanなど複数のファウンデーションモデルを単一のAPIで使用できるサービスです。

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

response = bedrock.invoke_model(

modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',

body=json.dumps({

'anthropic_version': 'bedrock-2023-05-31',

'max_tokens': 1024,

'messages': [{'role': 'user', 'content': 'AIトレンドを説明して'}]

})

)

result = json.loads(response['body'].read())

print(result['content'][0]['text'])

AWS Lambdaを使ったサーバーレス推論

軽量モデルはLambdaでサーバーレスデプロイが可能です。

def lambda_handler(event, context):

body = json.loads(event['body'])

input_data = body['input']

モデルはハンドラー外部（グローバルスコープ）でロード済み

prediction = run_inference(input_data)

return {

'statusCode': 200,

'body': json.dumps({'prediction': prediction})

}

3. GCP AI/MLサービス

Vertex AI Training

Google CloudのVertex AIは統合MLプラットフォームで、BigQueryとの緊密な統合が強みです。

from google.cloud import aiplatform

aiplatform.init(project='my-project', location='us-central1')

job = aiplatform.CustomTrainingJob(

display_name='pytorch-training',

script_path='train.py',

container_uri='us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-0:latest',

requirements=['transformers', 'datasets']

)

model = job.run(

dataset=None,

machine_type='a2-highgpu-8g',

accelerator_type='NVIDIA_TESLA_A100',

accelerator_count=8,

args=['--epochs=10', '--batch_size=32']

)

Vertex AIモデルのデプロイ

from google.cloud import aiplatform

モデルのアップロード

model = aiplatform.Model.upload(

display_name='my-pytorch-model',

artifact_uri='gs://bucket/model/',

serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/pytorch-gpu.2-0:latest'

)

エンドポイントの作成とデプロイ

endpoint = aiplatform.Endpoint.create(display_name='my-endpoint')

model.deploy(

endpoint=endpoint,

dedicated_resources_machine_type='n1-standard-4',

dedicated_resources_accelerator_type='NVIDIA_TESLA_T4',

dedicated_resources_accelerator_count=1,

min_replica_count=1,

max_replica_count=5

)

BigQuery ML

BigQuery MLはSQL構文でMLモデルを学習・予測できる強力なツールです。

-- BigQuery MLで分類モデルを学習

CREATE OR REPLACE MODEL `dataset.fraud_model`

OPTIONS(

model_type = 'BOOSTED_TREE_CLASSIFIER',

num_parallel_tree = 1,

max_iterations = 50,

input_label_cols = ['is_fraud']

) AS

SELECT * FROM `dataset.transactions_train`;

-- モデルの評価

SELECT *

FROM ML.EVALUATE(MODEL `dataset.fraud_model`,

(SELECT * FROM `dataset.transactions_test`)

);

-- 予測の実行

SELECT *

FROM ML.PREDICT(MODEL `dataset.fraud_model`,

(SELECT * FROM `dataset.new_transactions`)

);

4. Azure AIサービス

Azure Machine Learning

Azure MLはMicrosoftのエンタープライズグレードMLプラットフォームで、Active Directoryとの統合とハイブリッドクラウドサポートが強みです。

from azure.ai.ml import MLClient

from azure.ai.ml.entities import AmlCompute, Command

from azure.identity import DefaultAzureCredential

ml_client = MLClient(

DefaultAzureCredential(),

subscription_id="YOUR_SUBSCRIPTION",

resource_group_name="rg-ai",

workspace_name="ai-workspace"

)

GPUコンピュートクラスターの作成

compute_config = AmlCompute(

name="gpu-cluster",

type="amlcompute",

size="Standard_ND96asr_v4",

min_instances=0,

max_instances=4,

idle_time_before_scale_down=120

)

ml_client.compute.begin_create_or_update(compute_config).result()

Azure ML学習ジョブの実行

from azure.ai.ml.entities import Command

from azure.ai.ml import Input

job = Command(

code="./src",

command="python train.py --epochs 10 --learning_rate 0.001",

environment="AzureML-pytorch-2.0-ubuntu20.04-py38-cuda11-gpu@latest",

compute="gpu-cluster",

inputs={

"train_data": Input(type="uri_folder", path="azureml://datastores/mydata/paths/train/")

display_name="pytorch-training-job"

)

returned_job = ml_client.jobs.create_or_update(job)

print(f"Job URL: {returned_job.studio_url}")

Azure OpenAI Service

from openai import AzureOpenAI

client = AzureOpenAI(

azure_endpoint="https://your-resource.openai.azure.com/",

api_key="YOUR_API_KEY",

api_version="2024-02-01"

)

response = client.chat.completions.create(

model="gpt-4o",

messages=[

{"role": "system", "content": "You are an AI assistant."},

{"role": "user", "content": "クラウドAIのメリットを教えて"}

]

)

print(response.choices[0].message.content)

5. AIのためのKubernetes（EKS/GKE/AKS）

Kubernetesは大規模AIワークロードのオーケストレーションの標準となっています。

GPUノードプールの設定

GKE GPUノードプール（Terraform）

resource "google_container_node_pool" "gpu_pool" {

name = "gpu-pool"

cluster = google_container_cluster.primary.name

node_count = 2

node_config {

machine_type = "a2-highgpu-1g"

guest_accelerator {

type = "nvidia-tesla-a100"

count = 1

}

oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]

}

NVIDIAデバイスプラグイン

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml

Kubeflowパイプラインの定義

from kfp import dsl

@dsl.component(

base_image='python:3.10',

packages_to_install=['scikit-learn', 'pandas']

)

def train_model(

data_path: str,

model_path: kfp.dsl.OutputPath(str)

from sklearn.ensemble import RandomForestClassifier

df = pd.read_csv(data_path)

X = df.drop('label', axis=1)

y = df['label']

model = RandomForestClassifier(n_estimators=100)

model.fit(X, y)

with open(model_path, 'wb') as f:

pickle.dump(model, f)

@dsl.pipeline(name='ml-pipeline')

def ml_pipeline(data_path: str = 'gs://bucket/data.csv'):

train_task = train_model(data_path=data_path)

KEDAを使った自動スケーリング

KEDA（Kubernetes Event-driven Autoscaling）は、キューの深さやHTTPリクエスト数に基づいてAI推論ポッドを自動スケーリングします。

apiVersion: keda.sh/v1alpha1

kind: ScaledObject

metadata:

spec:

scaleTargetRef:

minReplicaCount: 1

maxReplicaCount: 20

triggers:

- type: aws-sqs-queue

metadata:

queueURL: https://sqs.us-east-1.amazonaws.com/123456789/inference-queue

queueLength: '5'

6. サーバーレスAI推論

サーバーレスは、断続的または予測不可能なトラフィックを持つAIサービスにとってコスト効率の良い選択肢です。

コールドスタートの最適化

MLモデルのコールドスタートは数秒から数十秒かかることがあります。最小化する方法：

1. **プロビジョニング済み同時実行（AWS Lambda）**: 事前にウォームアップされたインスタンスを維持

2. **コンテナイメージの最適化**: 不要なパッケージを削除、マルチステージビルドを使用

3. **モデルの量子化**: FP16/INT8でモデルサイズを半分以上削減

4. **レイジーフリーロード**: ハンドラー関数の外部（グローバル変数）でモデルを初期化

Lambdaコールドスタート最適化パターン

from transformers import pipeline

ハンドラーの外部でモデルをロード（グローバルスコープ）

コンテナが再利用される場合、このコードは再実行されない

classifier = pipeline(

'sentiment-analysis',

model='distilbert-base-uncased-finetuned-sst-2-english'

)

def lambda_handler(event, context):

body = json.loads(event['body'])

text = body['text']

result = classifier(text)

return {

'statusCode': 200,

'body': json.dumps(result)

}

AWS Fargateを使ったコンテナベースのサーバーレス

Fargateはサーバー管理なしでコンテナを実行します。Lambdaのメモリ・時間制限なしに大規模モデルのサービングが可能です。

{

"family": "ai-inference-task",

"networkMode": "awsvpc",

"requiresCompatibilities": ["FARGATE"],

"cpu": "4096",

"memory": "16384",

"containerDefinitions": [

{

"name": "inference-container",

"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/my-model:latest",

"portMappings": [{ "containerPort": 8080 }],

"environment": [{ "name": "MODEL_PATH", "value": "/opt/ml/model" }]

}

]

}

7. AIのためのデータストレージ

オブジェクトストレージの比較

| サービス | プロバイダー | 主な特徴 |

| -------------------- | ------------ | ---------------------------- |

| Amazon S3 | AWS | 11ナインの耐久性、豊富なSDK |

| Google Cloud Storage | GCP | BigQueryとのネイティブ統合 |

| Azure Blob Storage | Azure | Azure Data Lake Gen2サポート |

データレイクアーキテクチャ

AIデータパイプラインのための効率的なレイクハウスパターン：

Raw Layer（Bronze）

└── ソースデータをそのまま保存

└── パーティション: year/month/day/

Processed Layer（Silver）

└── クリーニング、重複排除、スキーマ適用

└── Parquet / Delta Lakeフォーマット

Feature Layer（Gold）

└── フィーチャーエンジニアリング完了

└── Feature Storeに登録

大規模モデルチェックポイントの管理

def save_checkpoint_to_s3(model, optimizer, epoch, loss, bucket, prefix):

"""モデルチェックポイントをS3に保存する"""

checkpoint = {

'epoch': epoch,

'model_state_dict': model.state_dict(),

'optimizer_state_dict': optimizer.state_dict(),

'loss': loss,

}

local_path = f'/tmp/checkpoint_epoch_{epoch}.pt'

torch.save(checkpoint, local_path)

s3 = boto3.client('s3')

s3_key = f'{prefix}/checkpoint_epoch_{epoch}.pt'

s3.upload_file(local_path, bucket, s3_key)

print(f'チェックポイントを保存: s3://{bucket}/{s3_key}')

8. クラウドAIの監視

モデルパフォーマンスドリフトの検出

本番モデルは時間の経過とともにパフォーマンスが低下することがあります。SageMaker Model Monitorの例：

from sagemaker.model_monitor import DataCaptureConfig, DefaultModelMonitor

from sagemaker.model_monitor.dataset_format import DatasetFormat

データキャプチャの設定

data_capture_config = DataCaptureConfig(

enable_capture=True,

sampling_percentage=20,

destination_s3_uri='s3://bucket/capture'

)

モデルモニターの作成

monitor = DefaultModelMonitor(

role=role,

instance_count=1,

instance_type='ml.m5.xlarge',

volume_size_in_gb=20,

max_runtime_in_seconds=3600

)

学習データからベースラインを作成

monitor.suggest_baseline(

baseline_dataset='s3://bucket/train_data.csv',

dataset_format=DatasetFormat.csv(header=True)

)

CloudWatchカスタムメトリクスとアラーム

cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')

カスタムメトリクスを送信

cloudwatch.put_metric_data(

Namespace='MLOps/ModelPerformance',

MetricData=[

{

'MetricName': 'PredictionAccuracy',

'Value': 0.94,

'Unit': 'None',

'Dimensions': [

{'Name': 'ModelName', 'Value': 'fraud-detector-v2'},

{'Name': 'Environment', 'Value': 'production'}

]

}

]

)

9. クラウド上のMLOps

GitHub Actions + AWS CodePipeline CI/CD

on:

push:

branches: [main]

jobs:

train-and-deploy:

runs-on: ubuntu-latest

steps:

- uses: actions/checkout@v4

- name: Configure AWS credentials

uses: aws-actions/configure-aws-credentials@v4

with:

aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}

aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

aws-region: us-east-1

- name: Run SageMaker training

run: |

python scripts/run_training.py \

--instance-type ml.p3.2xlarge \

--output-path s3://bucket/models/

- name: Deploy to staging

run: |

python scripts/deploy_model.py \

--endpoint-name staging-endpoint \

--instance-type ml.g4dn.xlarge

MLflow + S3モデルレジストリ

mlflow.set_tracking_uri('s3://bucket/mlflow')

mlflow.set_experiment('fraud-detection')

with mlflow.start_run():

mlflow.log_params({

'learning_rate': 0.001,

'batch_size': 32,

'epochs': 10

})

for epoch in range(10):

train_loss = train_one_epoch(model, train_loader, optimizer)

val_accuracy = evaluate(model, val_loader)

mlflow.log_metrics({

'train_loss': train_loss,

'val_accuracy': val_accuracy

}, step=epoch)

mlflow.pytorch.log_model(model, 'model')

mlflow.register_model(

model_uri=f'runs:/{mlflow.active_run().info.run_id}/model',

name='fraud-detector'

)

カナリアデプロイメント

sm = boto3.client('sagemaker')

トラフィックの10%を新しいモデルに切り替え

sm.update_endpoint_weights_and_capacities(

EndpointName='production-endpoint',

DesiredWeightsAndCapacities=[

{

'VariantName': 'current-model',

'DesiredWeight': 90,

'DesiredInstanceCount': 4

{

'VariantName': 'new-model',

'DesiredWeight': 10,

'DesiredInstanceCount': 1

}

]

)

10. クイズ

**正解**: `distribution`パラメーターに`{'torch_distributed': {'enabled': True}}`を指定し、`instance_count`を2以上に設定します。

**解説**: SageMakerのPyTorch Estimatorは`distribution`オプションを通じて複数の分散学習方式をサポートしています。`torch_distributed`はPyTorchのネイティブ分散学習フレームワークを活用し、SageMakerがノード間の通信設定を自動で行います。

**正解**: SpotインスタンスはAWSの余剰容量を活用してOn-Demandと比較して最大90%安くなりますが、AWSが容量を必要とする場合、2分の通知後にインスタンスを回収することがあります。On-Demandはいつでも利用可能ですが、定価です。

**解説**: ML学習にSpotを使用する場合、チェックポイントの実装が必須です。SageMakerは`CheckpointConfig`を通じてS3への自動チェックポイントをサポートし、インタラプション後の自動再起動も可能です。

**正解**: `input_label_cols`はモデルが予測すべきターゲット列（ラベル）を指定します。指定された列は特徴量から自動的に除外されます。

**解説**: BigQuery MLはSQLクエリの結果を直接学習データとして使用します。`input_label_cols`が正しく設定されていないと、ターゲット値が特徴量として含まれてしまい、データリーケージが発生して人工的に高いモデル精度が得られてしまいます。

**正解**: KEDAはイベント駆動型のオートスケーリングを提供します。SQSキューの深さ、Kafkaコンシューマーラグ、HTTPリクエスト数などの外部イベントソースに基づいてポッドをスケールします。これは、CPUとメモリのみに反応する標準のHPAとは異なります。

**解説**: AI推論サービスでは、実際のワークロードキューに基づいたスケーリングがCPUベースのスケーリングよりも応答性が高いです。KEDAはアイドル時間中にポッドを0までスケールダウンすることも可能で、アイドル状態のコンピュートコストを排除できます。

**正解**: S3 URI形式を使用します。例えば`s3://my-mlflow-bucket/mlflow`のように指定します。

**解説**: MLflowはS3をアーティファクトストアとして使用できます。別途トラッキングサーバーなしでも、S3だけで実験メタデータとモデルアーティファクトを一元管理できます。EC2またはSageMaker環境ではIAMロールを通じた自動認証が可能です。

参考資料

- [AWS SageMakerドキュメント](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html)

- [Google Vertex AIドキュメント](https://cloud.google.com/vertex-ai/docs)

- [Azure Machine Learningドキュメント](https://learn.microsoft.com/en-us/azure/machine-learning/)

- [Kubeflowドキュメント](https://www.kubeflow.org/docs/)

- [MLflowドキュメント](https://mlflow.org/docs/latest/index.html)

- [KEDAドキュメント](https://keda.sh/docs/)