Split View: AI 거버넌스 & 책임감 있는 AI: EU AI Act, XAI, 편향 감지, AI 안전 기술까지

AI 거버넌스 & 책임감 있는 AI: EU AI Act, XAI, 편향 감지, AI 안전 기술까지

AI 거버넌스 프레임워크 개요

AI 시스템이 사회 전반에 배포됨에 따라, 거버넌스 체계의 필요성이 급속히 대두되고 있습니다. AI 거버넌스(AI Governance)란 AI 시스템의 개발·배포·운영 전 과정에서 위험을 관리하고, 사회적 가치와 법적 요건을 충족하도록 만드는 정책·절차·기술의 총체입니다.

주요 글로벌 프레임워크:

프레임워크	주체	핵심 특징
EU AI Act	유럽연합	법적 구속력, 위험 기반 접근
NIST AI RMF	미국 NIST	자발적 지침, 리스크 관리
ISO 42001	ISO/IEC	인증 가능한 AI 경영 시스템
G7 AI 원칙	G7 국가	국제 협력, 비구속적 원칙
UNESCO AI 윤리 권고	UNESCO	인권 중심, 전 세계 적용

EU AI Act: 위험 등급 분류 체계

2024년 발효된 EU AI Act는 세계 최초의 포괄적 AI 법령입니다. 위험 기반(risk-based) 접근법을 채택하여 AI 시스템을 4가지 위험 등급으로 분류합니다.

위험 등급 분류

1. 용납 불가 위험 (Unacceptable Risk) — 금지

실시간 원격 생체 인식(공공장소 CCTV 안면 인식)
사회 점수제(Social Scoring) 시스템
취약 계층 조작을 위한 잠재의식 기술
법 집행 예측 폴리싱(개인 단위)

2. 고위험 (High-Risk) — 엄격한 의무

의료 진단 보조, 의료기기 소프트웨어
자율주행 및 핵심 인프라
채용·인사 평가 시스템
신용 평가·보험 심사
사법·법 집행 지원 도구
교육 평가 시스템

3. 제한적 위험 (Limited Risk) — 투명성 의무

챗봇: AI임을 고지해야 함
딥페이크 콘텐츠: 합성물임을 표시해야 함
감정 인식 시스템: 사용 사실 고지

4. 최소 위험 (Minimal Risk) — 자율 규제

스팸 필터, 게임 AI
AI 기반 재고 관리 등

EU AI Act 위험 분류기 구현

from dataclasses import dataclass
from enum import Enum
from typing import List

class RiskLevel(Enum):
    UNACCEPTABLE = "용납 불가 (금지)"
    HIGH = "고위험 (엄격 규제)"
    LIMITED = "제한적 위험 (투명성 의무)"
    MINIMAL = "최소 위험 (자율 규제)"

@dataclass
class AISystemProfile:
    name: str
    uses_biometric: bool
    is_real_time: bool
    public_space: bool
    domain: str  # healthcare, hiring, credit, education, judiciary, infrastructure
    interacts_with_humans: bool
    generates_synthetic_content: bool

def classify_eu_ai_act_risk(system: AISystemProfile) -> tuple[RiskLevel, List[str]]:
    """
    EU AI Act 위험 등급 분류기.
    Returns (RiskLevel, list_of_applicable_obligations)
    """
    obligations = []

    # 1단계: 용납 불가 위험 확인
    if (system.uses_biometric and system.is_real_time and system.public_space):
        return RiskLevel.UNACCEPTABLE, ["즉시 사용 중단", "법적 금지 대상"]

    # 2단계: 고위험 도메인 확인
    HIGH_RISK_DOMAINS = {
        "healthcare", "hiring", "credit",
        "education", "judiciary", "critical_infrastructure"
    }
    if system.domain in HIGH_RISK_DOMAINS:
        obligations = [
            "적합성 평가(Conformity Assessment) 필수",
            "기술 문서화 의무",
            "인간 감독(Human Oversight) 체계 구축",
            "투명성 및 로깅 요건",
            "편향 테스트 및 데이터 거버넌스",
            "EU 데이터베이스 등록",
        ]
        return RiskLevel.HIGH, obligations

    # 3단계: 제한적 위험
    if system.interacts_with_humans or system.generates_synthetic_content:
        obligations = [
            "AI 시스템임을 사용자에게 고지",
            "합성 콘텐츠 워터마킹 또는 표시",
        ]
        return RiskLevel.LIMITED, obligations

    # 4단계: 최소 위험
    return RiskLevel.MINIMAL, ["자발적 행동 강령 권장"]


# 사용 예시
credit_scoring_system = AISystemProfile(
    name="자동 신용 평가 AI",
    uses_biometric=False,
    is_real_time=False,
    public_space=False,
    domain="credit",
    interacts_with_humans=False,
    generates_synthetic_content=False,
)

risk_level, obligations = classify_eu_ai_act_risk(credit_scoring_system)
print(f"시스템: {credit_scoring_system.name}")
print(f"위험 등급: {risk_level.value}")
print("의무 사항:")
for ob in obligations:
    print(f"  - {ob}")

NIST AI RMF & ISO 42001

NIST AI Risk Management Framework

NIST AI RMF(2023)는 4개의 핵심 기능으로 구성됩니다.

GOVERN: AI 위험 관리 문화와 정책 수립
MAP: AI 위험 컨텍스트 파악 및 분류
MEASURE: 위험 분석, 평가, 측정
MANAGE: 우선순위 기반 위험 대응

ISO/IEC 42001: AI 경영 시스템

ISO 42001은 조직이 AI를 책임감 있게 개발·배포하기 위한 경영 시스템 표준입니다. ISO 9001(품질)이나 ISO 27001(보안)처럼 제3자 인증을 받을 수 있습니다.

핵심 요구사항:

AI 정책 및 목표 수립
리더십 책임 명확화
위험 및 기회 평가
AI 영향 평가 수행
내부 감사 및 지속적 개선

책임감 있는 AI 개발 원칙

FATE 프레임워크

Fairness (공정성): 유사한 상황의 사람들을 유사하게 처우. 특정 집단에 불이익을 주지 않음.

Accountability (책임): 의사결정에 대한 책임 소재 명확화. "누가 이 결정에 책임지는가?"

Transparency (투명성): AI 시스템의 작동 방식, 훈련 데이터, 한계를 이해관계자에게 공개.

Explainability (설명 가능성): 개별 예측의 근거를 인간이 이해할 수 있는 언어로 설명.

G7 히로시마 AI 원칙 (2023)

법의 지배 및 인권 존중
투명성 및 설명 가능성
공정성과 비차별
인간 감독 및 통제
개인정보 보호
사이버보안
정보 공유 및 신고

편향 감지 & 완화 기법

AI 모델의 편향은 훈련 데이터의 역사적 불평등, 피처 선택 오류, 레이블링 오류 등에서 발생합니다.

주요 공정성 지표

Demographic Parity (통계적 평등): 보호 집단 간 양성 예측률이 동일해야 함. P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1)

Equal Opportunity (동등 기회): 보호 집단 간 참 양성률(TPR)이 동일해야 함. P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1)

Calibration (교정): 예측 확률이 실제 양성률과 일치해야 함 (집단별로).

Individual Fairness: 유사한 개인은 유사하게 처우받아야 함.

AIF360을 이용한 편향 감지

import numpy as np
import pandas as pd
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# 1. 데이터 준비 (대출 승인 시나리오)
np.random.seed(42)
n = 1000
data = pd.DataFrame({
    'income': np.random.normal(50000, 20000, n).clip(10000, 150000),
    'credit_score': np.random.normal(680, 100, n).clip(300, 850),
    'age': np.random.randint(20, 70, n),
    'gender': np.random.choice([0, 1], n, p=[0.5, 0.5]),  # 0=female, 1=male
    'loan_approved': np.zeros(n, dtype=int)
})
# 인위적 편향 주입: 남성이 승인될 확률이 더 높게 설정
prob = 0.3 + 0.2 * data['gender'] + 0.3 * (data['credit_score'] > 700).astype(int)
data['loan_approved'] = (np.random.random(n) < prob).astype(int)

# 2. AIF360 데이터셋 생성
aif_dataset = BinaryLabelDataset(
    df=data,
    label_names=['loan_approved'],
    protected_attribute_names=['gender'],
    favorable_label=1,
    unfavorable_label=0,
)

# 3. 편향 측정
privileged_groups = [{'gender': 1}]    # 남성
unprivileged_groups = [{'gender': 0}]  # 여성

dataset_metric = BinaryLabelDatasetMetric(
    aif_dataset,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print("=== 원본 데이터 편향 분석 ===")
print(f"Disparate Impact: {dataset_metric.disparate_impact():.4f}")
print(f"Statistical Parity Difference: {dataset_metric.statistical_parity_difference():.4f}")
# Disparate Impact < 0.8 → 80% 규칙 위반 (편향 존재)

# 4. Reweighing으로 전처리 편향 완화
rw = Reweighing(
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
dataset_reweighed = rw.fit_transform(aif_dataset)

# 완화 후 측정
metric_reweighed = BinaryLabelDatasetMetric(
    dataset_reweighed,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print("\n=== Reweighing 후 편향 분석 ===")
print(f"Disparate Impact: {metric_reweighed.disparate_impact():.4f}")
print(f"Statistical Parity Difference: {metric_reweighed.statistical_parity_difference():.4f}")

Fairlearn을 이용한 후처리 완화

from fairlearn.postprocessing import ThresholdOptimizer
from fairlearn.metrics import MetricFrame, selection_rate, demographic_parity_difference
from sklearn.ensemble import GradientBoostingClassifier

# 모델 훈련
X = data[['income', 'credit_score', 'age']].values
y = data['loan_approved'].values
sensitive = data['gender'].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

base_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
base_model.fit(X_scaled, y)

# ThresholdOptimizer: 집단별 결정 임계값 최적화
postprocess_est = ThresholdOptimizer(
    estimator=base_model,
    constraints="demographic_parity",
    predict_method="predict_proba",
    objective="balanced_accuracy_score",
)
postprocess_est.fit(X_scaled, y, sensitive_features=sensitive)

y_pred_fair = postprocess_est.predict(X_scaled, sensitive_features=sensitive)

# 공정성 지표 측정
mf = MetricFrame(
    metrics={"selection_rate": selection_rate},
    y_true=y,
    y_pred=y_pred_fair,
    sensitive_features=sensitive,
)
print("\n=== Fairlearn 후처리 결과 ===")
print(f"집단별 선택률:\n{mf.by_group}")
print(f"Demographic Parity Difference: {demographic_parity_difference(y, y_pred_fair, sensitive_features=sensitive):.4f}")

설명 가능한 AI (XAI)

SHAP: SHapley Additive exPlanations

SHAP는 게임 이론의 Shapley 값을 활용해 각 피처가 예측값에 기여하는 정도를 계산합니다. 모든 피처 조합에서 한 피처를 추가했을 때의 한계 기여도 평균을 측정합니다.

import shap
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# 모델 훈련
X_train, y_train = make_classification(
    n_samples=500, n_features=8, n_informative=5, random_state=42
)
feature_names = [
    'income', 'credit_score', 'age', 'debt_ratio',
    'employment_years', 'num_accounts', 'late_payments', 'loan_amount'
]

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# SHAP TreeExplainer (트리 기반 모델 전용, 빠름)
explainer = shap.TreeExplainer(rf_model)
shap_values = explainer.shap_values(X_train)

# 개별 예측 설명 (Waterfall Plot)
sample_idx = 0
shap.waterfall_plot(
    shap.Explanation(
        values=shap_values[1][sample_idx],
        base_values=explainer.expected_value[1],
        data=X_train[sample_idx],
        feature_names=feature_names,
    )
)

# 전역 중요도 (Summary Plot)
shap.summary_plot(shap_values[1], X_train, feature_names=feature_names)

# SHAP 상호작용 효과
shap_interaction = explainer.shap_interaction_values(X_train[:100])
print(f"Income-CreditScore 상호작용 SHAP: {shap_interaction[0, 0, 1]:.4f}")

LIME: Local Interpretable Model-agnostic Explanations

import lime
import lime.lime_tabular
import numpy as np

# LIME 설명기 생성
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train,
    feature_names=feature_names,
    class_names=['거절', '승인'],
    mode='classification',
    discretize_continuous=True,
)

# 개별 샘플 설명
explanation = lime_explainer.explain_instance(
    data_row=X_train[0],
    predict_fn=rf_model.predict_proba,
    num_features=6,
    num_samples=1000,
)

print("=== LIME 설명 (샘플 #0) ===")
for feature, weight in explanation.as_list():
    direction = "증가" if weight > 0 else "감소"
    print(f"  {feature}: {weight:+.4f} ({direction}에 기여)")

explanation.show_in_notebook(show_table=True)

모델 카드(Model Card) 생성

import json
from datetime import datetime

def generate_model_card(
    model_name: str,
    version: str,
    intended_use: str,
    out_of_scope_uses: list,
    training_data: dict,
    evaluation_results: dict,
    fairness_analysis: dict,
    limitations: list,
    ethical_considerations: list,
) -> dict:
    """표준 모델 카드 생성기 (Mitchell et al. 2019 기반)."""
    model_card = {
        "model_details": {
            "name": model_name,
            "version": version,
            "date": datetime.now().strftime("%Y-%m-%d"),
            "type": "Binary Classifier",
            "paper": "https://arxiv.org/abs/1810.03993",
        },
        "intended_use": {
            "primary_uses": intended_use,
            "primary_users": ["신용 심사 담당자", "금융 규제 기관"],
            "out_of_scope_uses": out_of_scope_uses,
        },
        "factors": {
            "relevant_factors": ["gender", "age_group", "income_bracket"],
            "evaluation_factors": ["demographic_parity", "equal_opportunity"],
        },
        "metrics": {
            "performance_measures": evaluation_results,
            "decision_thresholds": {"default": 0.5, "high_precision": 0.7},
        },
        "training_data": training_data,
        "fairness_analysis": fairness_analysis,
        "limitations": limitations,
        "ethical_considerations": ethical_considerations,
        "caveats_recommendations": [
            "정기적인 드리프트 모니터링 권장",
            "분기별 편향 재평가 필수",
            "고위험 결정 시 인간 검토 병행",
        ],
    }
    return model_card

card = generate_model_card(
    model_name="신용 대출 승인 모델 v2.1",
    version="2.1.0",
    intended_use="개인 신용 대출 신청 초기 심사 자동화",
    out_of_scope_uses=["기업 대출 심사", "보험료 산정", "고용 결정"],
    training_data={"size": 50000, "period": "2020-2024", "source": "내부 대출 이력"},
    evaluation_results={"accuracy": 0.84, "AUC": 0.91, "F1": 0.82},
    fairness_analysis={
        "demographic_parity_diff": 0.03,
        "equal_opportunity_diff": 0.02,
        "disparate_impact": 0.96,
    },
    limitations=["2020년 이전 데이터 미포함", "농촌 지역 대표성 부족"],
    ethical_considerations=["최종 결정은 반드시 인간 담당자가 검토", "거절 사유 의무 고지"],
)
print(json.dumps(card, ensure_ascii=False, indent=2))

AI 안전 기술

Constitutional AI (Anthropic)

Constitutional AI는 AI 모델이 명시된 원칙 집합("헌법")에 따라 자신의 응답을 비평하고 수정하도록 훈련하는 방법입니다.

작동 방식:

모델이 잠재적으로 유해한 응답 생성
헌법 원칙 기반으로 자기 비평(self-critique) 수행
원칙을 준수하는 방향으로 응답 수정
수정된 응답으로 RLHF 훈련

RLHF (Reinforcement Learning from Human Feedback)

1. SFT (Supervised Fine-Tuning): 고품질 데모 데이터로 기반 모델 미세 조정
2. Reward Modeling: 인간 선호도 쌍(preferred vs rejected)으로 보상 모델 훈련
3. RL 최적화: PPO 알고리즘으로 보상 극대화 (KL 발산 제약 포함)

탈옥(Jailbreak) 방어 기법

입력 필터링: 유해 패턴 사전 감지 및 차단
프롬프트 인젝션 방어: 시스템 프롬프트와 사용자 입력 격리
출력 모니터링: 생성 텍스트의 실시간 안전성 검사
레드팀잉(Red Teaming): 공격자 역할의 전문가 팀이 취약점 탐색

AI 워터마킹

텍스트 워터마킹은 LLM 생성 텍스트에 통계적으로 감지 가능한 패턴을 삽입합니다.

import hashlib
import random

def green_red_watermark(text: str, key: str, gamma: float = 0.25) -> dict:
    """
    Kirchenbauer et al. (2023) 방식의 그린/레드 리스트 워터마킹.
    토큰 시퀀스에서 이전 토큰을 시드로 사용해 토큰을 그린/레드로 분류,
    그린 토큰을 선호하여 생성 시 워터마크를 삽입.
    """
    words = text.split()
    green_count = 0
    total = len(words)

    for i, word in enumerate(words):
        prev_token = words[i - 1] if i > 0 else "<s>"
        seed = int(hashlib.sha256(f"{key}{prev_token}".encode()).hexdigest(), 16) % (2**32)
        random.seed(seed)
        is_green = random.random() > (1 - gamma)
        if is_green:
            green_count += 1

    z_score = (green_count - gamma * total) / ((gamma * (1 - gamma) * total) ** 0.5 + 1e-9)
    return {
        "green_token_ratio": green_count / total,
        "z_score": z_score,
        "is_watermarked": z_score > 4.0,
    }

데이터 프라이버시 기술

차분 프라이버시 (Differential Privacy)

차분 프라이버시는 데이터베이스에 노이즈를 추가해 개별 레코드의 포함 여부를 통계적으로 숨깁니다. 엡실론(epsilon)이 작을수록 더 강한 프라이버시 보장을 제공합니다.

import torch
import torch.nn as nn
from opacus import PrivacyEngine
from torch.utils.data import DataLoader, TensorDataset

# 모델 정의
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 2),
        )
    def forward(self, x):
        return self.fc(x)

# 합성 데이터
X = torch.randn(1000, 10)
y = torch.randint(0, 2, (1000,))
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=64, shuffle=True)

model = SimpleNet()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Opacus PrivacyEngine 적용
privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=loader,
    target_epsilon=1.0,   # 엡실론: 작을수록 강한 프라이버시
    target_delta=1e-5,    # 델타: 엡실론 위반 확률
    max_grad_norm=1.0,    # 클리핑 임계값
    epochs=10,
)

# 훈련 루프
criterion = nn.CrossEntropyLoss()
for epoch in range(3):
    for batch_X, batch_y in loader:
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()

epsilon = privacy_engine.get_epsilon(delta=1e-5)
print(f"훈련 완료: epsilon = {epsilon:.2f}, delta = 1e-5")
print(f"프라이버시 보장: 개별 데이터 기여도가 e^{epsilon:.2f}배 이내로 제한")

연합 학습 (Federated Learning)

연합 학습은 원시 데이터를 중앙 서버로 전송하지 않고, 각 클라이언트가 로컬에서 훈련한 모델 가중치(그래디언트)만 공유합니다.

import copy
import numpy as np

def federated_averaging(global_model_weights, client_updates, client_data_sizes):
    """
    FedAvg 알고리즘: 데이터 크기 가중치 기반 평균 집계.
    """
    total_data = sum(client_data_sizes)
    averaged_weights = {}

    for key in global_model_weights.keys():
        weighted_sum = sum(
            client_updates[i][key] * (client_data_sizes[i] / total_data)
            for i in range(len(client_updates))
        )
        averaged_weights[key] = weighted_sum

    return averaged_weights

# GDPR AI 적용 체크리스트
gdpr_ai_checklist = {
    "데이터 최소화": "모델 훈련에 필요한 최소한의 데이터만 수집",
    "목적 제한": "훈련 데이터를 명시된 목적 외에 사용 금지",
    "정보 주체 권리": "자동화 결정에 대한 설명 요구권 보장 (Article 22)",
    "프로파일링 제한": "자동화 프로파일링 기반 중요 결정에 인간 검토 의무",
    "데이터 이동권": "개인이 자신의 데이터를 포터블 형식으로 받을 권리",
    "삭제권": "모델에서 개인 데이터 영향 제거 (Machine Unlearning)",
}

for right, description in gdpr_ai_checklist.items():
    print(f"[GDPR] {right}: {description}")

AI 규제 실무

모델 감사 프로세스

감사 범위 정의: 감사 대상 모델, 기간, 사용 사례 명확화
문서 검토: 훈련 데이터 출처, 모델 카드, 시스템 카드 검토
기술적 테스트: 편향 측정, 견고성 테스트, 적대적 공격 시뮬레이션
이해관계자 인터뷰: 운영팀, 피영향 집단 대표, 규제 담당자
감사 보고서 작성: 발견 사항, 위험 등급, 권고 조치

AI 윤리 위원회 구성

효과적인 AI 윤리 위원회는 다음 역할을 포함해야 합니다.

역할	필요 역량
AI/ML 기술 전문가	모델 작동 원리 이해
법률/컴플라이언스	규제 요건 해석
윤리학자/철학자	가치 충돌 조정
도메인 전문가	적용 분야 맥락 제공
피영향 집단 대표	실제 영향 반영
사이버보안 전문가	보안 위험 평가

위험 평가 템플릿

from dataclasses import dataclass, field
from typing import List
from enum import IntEnum

class Severity(IntEnum):
    LOW = 1
    MEDIUM = 2
    HIGH = 3
    CRITICAL = 4

class Likelihood(IntEnum):
    RARE = 1
    UNLIKELY = 2
    POSSIBLE = 3
    LIKELY = 4

@dataclass
class AIRisk:
    risk_id: str
    description: str
    severity: Severity
    likelihood: Likelihood
    affected_groups: List[str]
    mitigation: str
    owner: str
    residual_risk: str = "TBD"

    @property
    def risk_score(self) -> int:
        return self.severity * self.likelihood

    @property
    def risk_level(self) -> str:
        score = self.risk_score
        if score >= 12:
            return "CRITICAL"
        elif score >= 8:
            return "HIGH"
        elif score >= 4:
            return "MEDIUM"
        return "LOW"

# 예시 위험 등록부
risks = [
    AIRisk(
        risk_id="RISK-001",
        description="신용 모델의 성별 편향으로 인한 대출 거절 차별",
        severity=Severity.HIGH,
        likelihood=Likelihood.POSSIBLE,
        affected_groups=["여성", "비이진 성별"],
        mitigation="Reweighing + 분기별 disparate impact 모니터링",
        owner="AI 윤리팀",
    ),
    AIRisk(
        risk_id="RISK-002",
        description="모델 결정에 대한 설명 불가로 GDPR Article 22 위반",
        severity=Severity.CRITICAL,
        likelihood=Likelihood.LIKELY,
        affected_groups=["모든 대출 신청자"],
        mitigation="SHAP 기반 결정 설명 시스템 구축",
        owner="컴플라이언스팀",
    ),
]

print("=== AI 위험 등록부 ===")
for risk in risks:
    print(f"\n[{risk.risk_id}] {risk.description}")
    print(f"  위험 수준: {risk.risk_level} (점수: {risk.risk_score})")
    print(f"  완화 조치: {risk.mitigation}")

퀴즈

Q1. EU AI Act에서 생체 인식 시스템이 고위험(High-Risk)으로 분류되는 조건은?

정답: 실시간(real-time) + 공공장소 + 원격 생체 인식의 세 조건이 동시에 충족되면 용납 불가(Unacceptable Risk)로 금지됩니다. 단, 법 집행 기관의 실종 아동 수색 등 특정 예외가 적용됩니다. 비실시간이거나 사후(post-hoc) 분석 목적의 생체 인식, 또는 사법·국경 통제에 사용되는 경우는 고위험(High-Risk)으로 분류되어 적합성 평가 등 엄격한 규제가 적용됩니다.

설명: EU AI Act Annex III는 법 집행·사법·국경 관리에서 사용되는 원격 생체 인식 시스템을 고위험 AI로 명시합니다. 공공장소에서 실시간 원격 생체 인식은 Article 5에 의해 원칙적으로 금지됩니다.

Q2. Demographic parity와 Equal opportunity 공정성 지표의 차이와 트레이드오프는?

정답: Demographic parity는 보호 집단 간 양성 예측률(P(Y_hat=1))이 동일해야 한다는 기준이고, Equal opportunity는 실제 양성인 샘플(Y=1) 중 양성으로 예측되는 비율(TPR)이 동일해야 한다는 기준입니다.

설명: Chouldechova(2017)의 불가능성 정리(Impossibility Theorem)에 따르면, 기저율(base rate)이 집단 간에 다를 경우 demographic parity, equal opportunity, predictive parity를 동시에 만족하는 것은 수학적으로 불가능합니다. 따라서 사용 맥락과 피해 유형에 따라 어떤 공정성 지표를 우선시할지 명시적으로 결정해야 합니다.

Q3. SHAP이 특성 중요도를 계산하는 게임 이론적 원리는?

정답: SHAP는 협력 게임 이론의 Shapley 값을 기반으로 합니다. 각 피처를 "플레이어"로, 모델의 예측값을 "보상"으로 간주하여, 모든 가능한 피처 부분집합(coalition)에 대해 특정 피처를 추가했을 때의 한계 기여도 평균을 계산합니다.

설명: Shapley 값은 효율성(모든 피처의 SHAP 합 = 예측값 - 기대값), 대칭성, 가산성, 더미 특성의 4가지 공리를 만족하는 유일한 분배 방법입니다. SHAP은 LIME과 달리 전역적(global) 일관성을 보장하며, TreeSHAP은 트리 기반 모델에서 O(TLD^2)의 효율적인 계산을 제공합니다.

Q4. 차분 프라이버시에서 엡실론(epsilon) 값이 작을수록 프라이버시가 강화되는 이유는?

정답: epsilon은 "한 데이터 포인트의 포함 여부가 출력 분포를 최대 e^epsilon배 바꿀 수 있다"는 상한을 정의합니다. epsilon이 0에 가까울수록 데이터 포함 여부에 관계없이 출력 분포가 거의 동일해져 개별 정보가 노출되지 않습니다.

설명: epsilon=0이면 완전한 프라이버시(완전한 무작위 출력), epsilon이 크면 실용적이지만 프라이버시 보호가 약해집니다. 실무에서는 epsilon=1 이하를 강한 프라이버시, epsilon=10 이하를 실용적 프라이버시로 봅니다. Opacus(PyTorch)나 TensorFlow Privacy 라이브러리가 자동으로 노이즈 크기와 epsilon을 계산합니다.

Q5. 모델 카드(Model Card)에 포함해야 하는 핵심 정보 항목은?

정답: 모델 상세(이름·버전·유형), 의도된 사용 및 범위 외 사용, 평가 요소(보호 속성), 성능 지표(정확도·AUC 등), 훈련 데이터 설명, 공정성 분석 결과, 한계 및 주의사항, 윤리적 고려사항.

설명: Mitchell et al.(2019)이 제안한 모델 카드는 AI 투명성 표준으로 자리잡았습니다. Google, Hugging Face 등 주요 기업이 모델 출시 시 모델 카드를 공개합니다. EU AI Act 고위험 AI는 기술 문서화 의무(Annex IV)에 모델 카드에 준하는 정보를 포함해야 합니다.

AI Governance & Responsible AI: EU AI Act, XAI, Bias Detection, and AI Safety Techniques

AI Governance Frameworks Overview
EU AI Act: Risk Classification System
NIST AI RMF & ISO 42001
Responsible AI Development Principles
Bias Detection & Mitigation
Explainable AI (XAI)
AI Safety Techniques
Data Privacy Technologies
AI Regulatory Practice
Quiz

AI Governance Frameworks Overview

As AI systems are deployed across society, the need for governance frameworks has grown rapidly. AI Governance refers to the totality of policies, procedures, and technologies that manage risks throughout the development, deployment, and operation of AI systems — ensuring alignment with societal values and legal requirements.

Key global frameworks:

Framework	Authority	Core Characteristics
EU AI Act	European Union	Legally binding, risk-based approach
NIST AI RMF	US NIST	Voluntary guidance, risk management
ISO 42001	ISO/IEC	Certifiable AI management system
G7 AI Principles	G7 Nations	International cooperation, non-binding
UNESCO AI Ethics Recommendation	UNESCO	Human rights-centered, global scope

EU AI Act: Risk Classification System

The EU AI Act, which entered into force in 2024, is the world's first comprehensive AI legislation. It adopts a risk-based approach, classifying AI systems into four risk tiers.

Risk Tier Classification

1. Unacceptable Risk — Prohibited

Real-time remote biometric identification in public spaces (e.g., CCTV facial recognition)
Social scoring systems
Subliminal manipulation techniques targeting vulnerable groups
Predictive policing at the individual level

2. High-Risk — Strict Obligations

Medical diagnosis assistance and medical device software
Autonomous vehicles and critical infrastructure
Recruitment and personnel evaluation systems
Credit scoring and insurance underwriting
Judiciary and law enforcement support tools
Educational assessment systems

3. Limited Risk — Transparency Obligations

Chatbots: must disclose that the user is interacting with AI
Deepfake content: must be labeled as synthetic
Emotion recognition systems: must disclose usage

4. Minimal Risk — Self-Regulation

Spam filters, game AI
AI-based inventory management, etc.

EU AI Act Risk Classifier Implementation

from dataclasses import dataclass
from enum import Enum
from typing import List

class RiskLevel(Enum):
    UNACCEPTABLE = "Unacceptable (Prohibited)"
    HIGH = "High-Risk (Strict Regulation)"
    LIMITED = "Limited Risk (Transparency Obligations)"
    MINIMAL = "Minimal Risk (Self-Regulation)"

@dataclass
class AISystemProfile:
    name: str
    uses_biometric: bool
    is_real_time: bool
    public_space: bool
    domain: str  # healthcare, hiring, credit, education, judiciary, infrastructure
    interacts_with_humans: bool
    generates_synthetic_content: bool

def classify_eu_ai_act_risk(system: AISystemProfile) -> tuple[RiskLevel, List[str]]:
    """
    EU AI Act risk classifier.
    Returns (RiskLevel, list_of_applicable_obligations)
    """
    obligations = []

    # Step 1: Check for unacceptable risk
    if (system.uses_biometric and system.is_real_time and system.public_space):
        return RiskLevel.UNACCEPTABLE, ["Cease operations immediately", "Legally prohibited"]

    # Step 2: Check for high-risk domains
    HIGH_RISK_DOMAINS = {
        "healthcare", "hiring", "credit",
        "education", "judiciary", "critical_infrastructure"
    }
    if system.domain in HIGH_RISK_DOMAINS:
        obligations = [
            "Mandatory Conformity Assessment",
            "Technical documentation obligation",
            "Human oversight mechanisms required",
            "Transparency and logging requirements",
            "Bias testing and data governance",
            "Registration in EU database",
        ]
        return RiskLevel.HIGH, obligations

    # Step 3: Limited risk
    if system.interacts_with_humans or system.generates_synthetic_content:
        obligations = [
            "Disclose AI system status to users",
            "Watermark or label synthetic content",
        ]
        return RiskLevel.LIMITED, obligations

    # Step 4: Minimal risk
    return RiskLevel.MINIMAL, ["Voluntary code of conduct recommended"]


# Usage example
credit_scoring_system = AISystemProfile(
    name="Automated Credit Scoring AI",
    uses_biometric=False,
    is_real_time=False,
    public_space=False,
    domain="credit",
    interacts_with_humans=False,
    generates_synthetic_content=False,
)

risk_level, obligations = classify_eu_ai_act_risk(credit_scoring_system)
print(f"System: {credit_scoring_system.name}")
print(f"Risk Level: {risk_level.value}")
print("Obligations:")
for ob in obligations:
    print(f"  - {ob}")

NIST AI RMF & ISO 42001

NIST AI Risk Management Framework

The NIST AI RMF (2023) is structured around four core functions:

GOVERN: Establish AI risk management culture and policies
MAP: Identify and categorize AI risk context
MEASURE: Analyze, evaluate, and quantify risks
MANAGE: Respond to risks based on priority

ISO/IEC 42001: AI Management System

ISO 42001 is a management system standard for organizations to develop and deploy AI responsibly. Like ISO 9001 (quality) or ISO 27001 (security), it can be certified by third parties.

Core requirements:

Establish AI policies and objectives
Clarify leadership responsibilities
Assess risks and opportunities
Conduct AI impact assessments
Perform internal audits and continuous improvement

Responsible AI Development Principles

The FATE Framework

Fairness: Treat similar people similarly. Do not disadvantage particular groups.

Accountability: Clarify responsibility for decisions. "Who is accountable for this decision?"

Transparency: Disclose how AI systems work, what data they were trained on, and their limitations.

Explainability: Explain the reasoning behind individual predictions in human-understandable terms.

G7 Hiroshima AI Principles (2023)

Rule of law and respect for human rights
Transparency and explainability
Fairness and non-discrimination
Human oversight and control
Privacy protection
Cybersecurity
Information sharing and incident reporting

Bias Detection & Mitigation

AI model bias originates from historical inequalities in training data, feature selection errors, labeling mistakes, and feedback loops.

Key Fairness Metrics

Demographic Parity (Statistical Parity): The positive prediction rate must be equal across protected groups. P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1)

Equal Opportunity: The true positive rate (TPR) must be equal across protected groups. P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1)

Calibration: Predicted probabilities must match actual positive rates (per group).

Individual Fairness: Similar individuals should be treated similarly.

Bias Detection with AIF360

import numpy as np
import pandas as pd
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# 1. Prepare data (loan approval scenario)
np.random.seed(42)
n = 1000
data = pd.DataFrame({
    'income': np.random.normal(50000, 20000, n).clip(10000, 150000),
    'credit_score': np.random.normal(680, 100, n).clip(300, 850),
    'age': np.random.randint(20, 70, n),
    'gender': np.random.choice([0, 1], n, p=[0.5, 0.5]),  # 0=female, 1=male
    'loan_approved': np.zeros(n, dtype=int)
})
# Inject artificial bias: males have higher approval probability
prob = 0.3 + 0.2 * data['gender'] + 0.3 * (data['credit_score'] > 700).astype(int)
data['loan_approved'] = (np.random.random(n) < prob).astype(int)

# 2. Create AIF360 dataset
aif_dataset = BinaryLabelDataset(
    df=data,
    label_names=['loan_approved'],
    protected_attribute_names=['gender'],
    favorable_label=1,
    unfavorable_label=0,
)

# 3. Measure bias
privileged_groups = [{'gender': 1}]    # male
unprivileged_groups = [{'gender': 0}]  # female

dataset_metric = BinaryLabelDatasetMetric(
    aif_dataset,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print("=== Original Data Bias Analysis ===")
print(f"Disparate Impact: {dataset_metric.disparate_impact():.4f}")
print(f"Statistical Parity Difference: {dataset_metric.statistical_parity_difference():.4f}")
# Disparate Impact < 0.8 → 80% rule violation (bias detected)

# 4. Preprocessing bias mitigation with Reweighing
rw = Reweighing(
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
dataset_reweighed = rw.fit_transform(aif_dataset)

metric_reweighed = BinaryLabelDatasetMetric(
    dataset_reweighed,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print("\n=== After Reweighing ===")
print(f"Disparate Impact: {metric_reweighed.disparate_impact():.4f}")
print(f"Statistical Parity Difference: {metric_reweighed.statistical_parity_difference():.4f}")

Post-processing Mitigation with Fairlearn

from fairlearn.postprocessing import ThresholdOptimizer
from fairlearn.metrics import MetricFrame, selection_rate, demographic_parity_difference
from sklearn.ensemble import GradientBoostingClassifier

# Train model
X = data[['income', 'credit_score', 'age']].values
y = data['loan_approved'].values
sensitive = data['gender'].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

base_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
base_model.fit(X_scaled, y)

# ThresholdOptimizer: optimize decision thresholds per group
postprocess_est = ThresholdOptimizer(
    estimator=base_model,
    constraints="demographic_parity",
    predict_method="predict_proba",
    objective="balanced_accuracy_score",
)
postprocess_est.fit(X_scaled, y, sensitive_features=sensitive)

y_pred_fair = postprocess_est.predict(X_scaled, sensitive_features=sensitive)

# Measure fairness metrics
mf = MetricFrame(
    metrics={"selection_rate": selection_rate},
    y_true=y,
    y_pred=y_pred_fair,
    sensitive_features=sensitive,
)
print("\n=== Fairlearn Post-processing Results ===")
print(f"Selection rate by group:\n{mf.by_group}")
print(f"Demographic Parity Difference: {demographic_parity_difference(y, y_pred_fair, sensitive_features=sensitive):.4f}")

Explainable AI (XAI)

SHAP: SHapley Additive exPlanations

SHAP leverages Shapley values from cooperative game theory to quantify each feature's contribution to a prediction. It computes the average marginal contribution of a feature across all possible feature subsets.

import shap
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Train model
X_train, y_train = make_classification(
    n_samples=500, n_features=8, n_informative=5, random_state=42
)
feature_names = [
    'income', 'credit_score', 'age', 'debt_ratio',
    'employment_years', 'num_accounts', 'late_payments', 'loan_amount'
]

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# SHAP TreeExplainer (tree-specific, fast)
explainer = shap.TreeExplainer(rf_model)
shap_values = explainer.shap_values(X_train)

# Individual prediction explanation (Waterfall Plot)
sample_idx = 0
shap.waterfall_plot(
    shap.Explanation(
        values=shap_values[1][sample_idx],
        base_values=explainer.expected_value[1],
        data=X_train[sample_idx],
        feature_names=feature_names,
    )
)

# Global importance (Summary Plot)
shap.summary_plot(shap_values[1], X_train, feature_names=feature_names)

# SHAP interaction effects
shap_interaction = explainer.shap_interaction_values(X_train[:100])
print(f"Income-CreditScore interaction SHAP: {shap_interaction[0, 0, 1]:.4f}")

LIME: Local Interpretable Model-agnostic Explanations

import lime
import lime.lime_tabular
import numpy as np

# Create LIME explainer
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train,
    feature_names=feature_names,
    class_names=['Rejected', 'Approved'],
    mode='classification',
    discretize_continuous=True,
)

# Explain individual sample
explanation = lime_explainer.explain_instance(
    data_row=X_train[0],
    predict_fn=rf_model.predict_proba,
    num_features=6,
    num_samples=1000,
)

print("=== LIME Explanation (Sample #0) ===")
for feature, weight in explanation.as_list():
    direction = "increases" if weight > 0 else "decreases"
    print(f"  {feature}: {weight:+.4f} ({direction} approval probability)")

explanation.show_in_notebook(show_table=True)

Generating a Model Card

import json
from datetime import datetime

def generate_model_card(
    model_name: str,
    version: str,
    intended_use: str,
    out_of_scope_uses: list,
    training_data: dict,
    evaluation_results: dict,
    fairness_analysis: dict,
    limitations: list,
    ethical_considerations: list,
) -> dict:
    """Standard model card generator (based on Mitchell et al. 2019)."""
    model_card = {
        "model_details": {
            "name": model_name,
            "version": version,
            "date": datetime.now().strftime("%Y-%m-%d"),
            "type": "Binary Classifier",
            "paper": "https://arxiv.org/abs/1810.03993",
        },
        "intended_use": {
            "primary_uses": intended_use,
            "primary_users": ["Credit officers", "Financial regulators"],
            "out_of_scope_uses": out_of_scope_uses,
        },
        "factors": {
            "relevant_factors": ["gender", "age_group", "income_bracket"],
            "evaluation_factors": ["demographic_parity", "equal_opportunity"],
        },
        "metrics": {
            "performance_measures": evaluation_results,
            "decision_thresholds": {"default": 0.5, "high_precision": 0.7},
        },
        "training_data": training_data,
        "fairness_analysis": fairness_analysis,
        "limitations": limitations,
        "ethical_considerations": ethical_considerations,
        "caveats_recommendations": [
            "Regular drift monitoring recommended",
            "Quarterly bias re-evaluation required",
            "Human review required for high-stakes decisions",
        ],
    }
    return model_card

card = generate_model_card(
    model_name="Personal Loan Approval Model v2.1",
    version="2.1.0",
    intended_use="Automated initial screening for personal loan applications",
    out_of_scope_uses=["Corporate loan assessment", "Insurance pricing", "Employment decisions"],
    training_data={"size": 50000, "period": "2020-2024", "source": "Internal loan history"},
    evaluation_results={"accuracy": 0.84, "AUC": 0.91, "F1": 0.82},
    fairness_analysis={
        "demographic_parity_diff": 0.03,
        "equal_opportunity_diff": 0.02,
        "disparate_impact": 0.96,
    },
    limitations=["Pre-2020 data not included", "Rural region underrepresentation"],
    ethical_considerations=["Final decisions must be reviewed by human officers", "Mandatory disclosure of rejection reasons"],
)
print(json.dumps(card, indent=2))

AI Safety Techniques

Constitutional AI (Anthropic)

Constitutional AI trains models to critique and revise their own responses according to a set of explicit principles (the "constitution").

How it works:

Model generates a potentially harmful response
Model performs self-critique based on constitutional principles
Model revises the response to comply with principles
Revised responses are used to train via RLHF

RLHF (Reinforcement Learning from Human Feedback)

1. SFT (Supervised Fine-Tuning): Fine-tune base model on high-quality demonstration data
2. Reward Modeling: Train reward model on human preference pairs (preferred vs. rejected)
3. RL Optimization: Maximize reward with PPO algorithm (with KL divergence constraint)

Jailbreak Defense Techniques

Input filtering: Detect and block harmful patterns before processing
Prompt injection defense: Isolate system prompts from user inputs
Output monitoring: Real-time safety checks on generated text
Red teaming: Expert adversarial teams systematically probe for vulnerabilities

AI Watermarking

Text watermarking inserts statistically detectable patterns into LLM-generated text.

import hashlib
import random

def green_red_watermark(text: str, key: str, gamma: float = 0.25) -> dict:
    """
    Green/red list watermarking based on Kirchenbauer et al. (2023).
    Uses the previous token as a seed to classify tokens as green or red,
    preferring green tokens during generation to embed a watermark.
    """
    words = text.split()
    green_count = 0
    total = len(words)

    for i, word in enumerate(words):
        prev_token = words[i - 1] if i > 0 else "<s>"
        seed = int(hashlib.sha256(f"{key}{prev_token}".encode()).hexdigest(), 16) % (2**32)
        random.seed(seed)
        is_green = random.random() > (1 - gamma)
        if is_green:
            green_count += 1

    z_score = (green_count - gamma * total) / ((gamma * (1 - gamma) * total) ** 0.5 + 1e-9)
    return {
        "green_token_ratio": green_count / total,
        "z_score": z_score,
        "is_watermarked": z_score > 4.0,
    }

Data Privacy Technologies

Differential Privacy

Differential privacy adds noise to databases to statistically conceal whether any individual record is included. Smaller epsilon values provide stronger privacy guarantees.

import torch
import torch.nn as nn
from opacus import PrivacyEngine
from torch.utils.data import DataLoader, TensorDataset

# Define model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 2),
        )
    def forward(self, x):
        return self.fc(x)

# Synthetic data
X = torch.randn(1000, 10)
y = torch.randint(0, 2, (1000,))
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=64, shuffle=True)

model = SimpleNet()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Apply Opacus PrivacyEngine
privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=loader,
    target_epsilon=1.0,   # epsilon: smaller = stronger privacy
    target_delta=1e-5,    # delta: probability of epsilon violation
    max_grad_norm=1.0,    # gradient clipping threshold
    epochs=10,
)

# Training loop
criterion = nn.CrossEntropyLoss()
for epoch in range(3):
    for batch_X, batch_y in loader:
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()

epsilon = privacy_engine.get_epsilon(delta=1e-5)
print(f"Training complete: epsilon = {epsilon:.2f}, delta = 1e-5")
print(f"Privacy guarantee: individual data contribution bounded by e^{epsilon:.2f}")

Federated Learning

Federated learning avoids sending raw data to a central server — clients share only locally trained model weights (gradients).

import copy
import numpy as np

def federated_averaging(global_model_weights, client_updates, client_data_sizes):
    """
    FedAvg algorithm: weighted average aggregation based on data sizes.
    """
    total_data = sum(client_data_sizes)
    averaged_weights = {}

    for key in global_model_weights.keys():
        weighted_sum = sum(
            client_updates[i][key] * (client_data_sizes[i] / total_data)
            for i in range(len(client_updates))
        )
        averaged_weights[key] = weighted_sum

    return averaged_weights

# GDPR AI compliance checklist
gdpr_ai_checklist = {
    "Data Minimization": "Collect only the minimum data necessary for model training",
    "Purpose Limitation": "Prohibit use of training data beyond its stated purpose",
    "Data Subject Rights": "Guarantee the right to explanation for automated decisions (Article 22)",
    "Profiling Restrictions": "Human review required for significant automated profiling decisions",
    "Data Portability": "Right to receive personal data in a portable format",
    "Right to Erasure": "Remove the influence of personal data from models (Machine Unlearning)",
}

for right, description in gdpr_ai_checklist.items():
    print(f"[GDPR] {right}: {description}")

AI Regulatory Practice

Model Audit Process

Define audit scope: Clarify the model, time period, and use case under review
Document review: Examine training data provenance, model cards, system cards
Technical testing: Bias measurement, robustness testing, adversarial attack simulation
Stakeholder interviews: Operations team, affected group representatives, regulators
Audit report: Document findings, risk ratings, and recommended actions

Composing an AI Ethics Committee

An effective AI ethics committee should include:

Role	Required Competency
AI/ML technical expert	Understand how models work
Legal/compliance officer	Interpret regulatory requirements
Ethicist/philosopher	Mediate value conflicts
Domain expert	Provide application context
Affected group representative	Reflect real-world impacts
Cybersecurity expert	Assess security risks

Risk Register Template

from dataclasses import dataclass, field
from typing import List
from enum import IntEnum

class Severity(IntEnum):
    LOW = 1
    MEDIUM = 2
    HIGH = 3
    CRITICAL = 4

class Likelihood(IntEnum):
    RARE = 1
    UNLIKELY = 2
    POSSIBLE = 3
    LIKELY = 4

@dataclass
class AIRisk:
    risk_id: str
    description: str
    severity: Severity
    likelihood: Likelihood
    affected_groups: List[str]
    mitigation: str
    owner: str
    residual_risk: str = "TBD"

    @property
    def risk_score(self) -> int:
        return self.severity * self.likelihood

    @property
    def risk_level(self) -> str:
        score = self.risk_score
        if score >= 12:
            return "CRITICAL"
        elif score >= 8:
            return "HIGH"
        elif score >= 4:
            return "MEDIUM"
        return "LOW"

# Example risk register
risks = [
    AIRisk(
        risk_id="RISK-001",
        description="Gender bias in credit model leading to discriminatory loan rejections",
        severity=Severity.HIGH,
        likelihood=Likelihood.POSSIBLE,
        affected_groups=["Women", "Non-binary individuals"],
        mitigation="Reweighing + quarterly disparate impact monitoring",
        owner="AI Ethics Team",
    ),
    AIRisk(
        risk_id="RISK-002",
        description="Inability to explain model decisions violating GDPR Article 22",
        severity=Severity.CRITICAL,
        likelihood=Likelihood.LIKELY,
        affected_groups=["All loan applicants"],
        mitigation="Build SHAP-based decision explanation system",
        owner="Compliance Team",
    ),
]

print("=== AI Risk Register ===")
for risk in risks:
    print(f"\n[{risk.risk_id}] {risk.description}")
    print(f"  Risk Level: {risk.risk_level} (Score: {risk.risk_score})")
    print(f"  Mitigation: {risk.mitigation}")

Quiz

Q1. Under the EU AI Act, what conditions classify a biometric system as High-Risk?

Answer: Real-time + public space + remote biometric identification — when all three conditions are met simultaneously, the system falls under Unacceptable Risk and is prohibited. Limited exceptions exist, such as law enforcement searching for missing children. Non-real-time or post-hoc biometric analysis, or biometric systems used in judiciary and border control contexts, are classified as High-Risk and subject to strict obligations including conformity assessments.

Explanation: EU AI Act Annex III explicitly lists remote biometric identification systems used in law enforcement, judiciary, and border management as high-risk AI. Real-time remote biometric identification in public spaces is principally prohibited under Article 5.

Q2. What is the difference between demographic parity and equal opportunity, and what trade-offs arise?

Answer: Demographic parity requires equal positive prediction rates across protected groups: P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1). Equal opportunity requires equal true positive rates (TPR) for positive-outcome individuals: P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1).

Explanation: Chouldechova's (2017) impossibility theorem shows that when base rates differ across groups, it is mathematically impossible to simultaneously satisfy demographic parity, equal opportunity, and predictive parity. Organizations must explicitly choose which fairness criterion to prioritize based on application context and the nature of potential harms.

Q3. What game-theoretic principle does SHAP use to calculate feature importance?

Answer: SHAP is based on Shapley values from cooperative game theory. Each feature is treated as a "player" and the model's prediction as the "payoff." It computes the average marginal contribution of each feature across all possible feature subsets (coalitions).

Explanation: Shapley values are the unique attribution method satisfying four axioms: efficiency (SHAP values sum to prediction minus expected value), symmetry, linearity, and the dummy feature property. Unlike LIME, SHAP guarantees global consistency. TreeSHAP computes values in O(TLD^2) time for tree-based models.

Q4. Why does a smaller epsilon value in differential privacy provide stronger privacy protection?

Answer: Epsilon defines an upper bound: "Including or excluding one data point can change the output distribution by at most e^epsilon." As epsilon approaches 0, the output distribution becomes nearly identical regardless of whether any individual record is included, preventing individual information from being inferred.

Explanation: Epsilon = 0 means perfect privacy (fully random output); large epsilon is practical but offers weaker protection. In practice, epsilon below 1 is considered strong privacy, and below 10 is considered practical privacy. Libraries like Opacus (PyTorch) and TensorFlow Privacy automatically compute the required noise scale and track epsilon.

Q5. What are the essential sections of a Model Card?

Answer: Model details (name, version, type), intended use and out-of-scope uses, evaluation factors (protected attributes), performance metrics (accuracy, AUC, etc.), training data description, fairness analysis results, limitations and caveats, and ethical considerations.

Explanation: Model cards, proposed by Mitchell et al. (2019), have become a transparency standard. Major organizations including Google and Hugging Face publish model cards with model releases. EU AI Act high-risk AI requires technical documentation under Annex IV that is substantially equivalent to a model card.

AI 거버넌스 & 책임감 있는 AI: EU AI Act, XAI, 편향 감지, AI 안전 기술까지

목차

AI 거버넌스 프레임워크 개요

EU AI Act: 위험 등급 분류 체계

위험 등급 분류

EU AI Act 위험 분류기 구현

NIST AI RMF & ISO 42001

NIST AI Risk Management Framework

ISO/IEC 42001: AI 경영 시스템

책임감 있는 AI 개발 원칙

FATE 프레임워크

G7 히로시마 AI 원칙 (2023)

편향 감지 & 완화 기법

주요 공정성 지표

AIF360을 이용한 편향 감지

Fairlearn을 이용한 후처리 완화

설명 가능한 AI (XAI)

SHAP: SHapley Additive exPlanations

LIME: Local Interpretable Model-agnostic Explanations

모델 카드(Model Card) 생성

AI 안전 기술

Constitutional AI (Anthropic)

RLHF (Reinforcement Learning from Human Feedback)

탈옥(Jailbreak) 방어 기법

AI 워터마킹

데이터 프라이버시 기술

차분 프라이버시 (Differential Privacy)

연합 학습 (Federated Learning)

AI 규제 실무

모델 감사 프로세스

AI 윤리 위원회 구성

위험 평가 템플릿

퀴즈

AI Governance & Responsible AI: EU AI Act, XAI, Bias Detection, and AI Safety Techniques

Table of Contents

AI Governance Frameworks Overview

EU AI Act: Risk Classification System

Risk Tier Classification

EU AI Act Risk Classifier Implementation

NIST AI RMF & ISO 42001

NIST AI Risk Management Framework

ISO/IEC 42001: AI Management System

Responsible AI Development Principles

The FATE Framework

G7 Hiroshima AI Principles (2023)

Bias Detection & Mitigation

Key Fairness Metrics

Bias Detection with AIF360

Post-processing Mitigation with Fairlearn

Explainable AI (XAI)

SHAP: SHapley Additive exPlanations

LIME: Local Interpretable Model-agnostic Explanations

Generating a Model Card

AI Safety Techniques

Constitutional AI (Anthropic)

RLHF (Reinforcement Learning from Human Feedback)

Jailbreak Defense Techniques

AI Watermarking

Data Privacy Technologies

Differential Privacy

Federated Learning

AI Regulatory Practice

Model Audit Process

Composing an AI Ethics Committee

Risk Register Template

Quiz