💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

1. [AI Governance Frameworks Overview](#ai-governance-frameworks-overview)

2. [EU AI Act: Risk Classification System](#eu-ai-act-risk-classification-system)

3. [NIST AI RMF & ISO 42001](#nist-ai-rmf--iso-42001)

4. [Responsible AI Development Principles](#responsible-ai-development-principles)

5. [Bias Detection & Mitigation](#bias-detection--mitigation)

6. [Explainable AI (XAI)](#explainable-ai-xai)

7. [AI Safety Techniques](#ai-safety-techniques)

8. [Data Privacy Technologies](#data-privacy-technologies)

9. [AI Regulatory Practice](#ai-regulatory-practice)

10. [Quiz](#quiz)

AI Governance Frameworks Overview

As AI systems are deployed across society, the need for governance frameworks has grown rapidly. AI Governance refers to the totality of policies, procedures, and technologies that manage risks throughout the development, deployment, and operation of AI systems — ensuring alignment with societal values and legal requirements.

Key global frameworks:

| Framework | Authority | Core Characteristics |

| ------------------------------- | -------------- | -------------------------------------- |

| EU AI Act | European Union | Legally binding, risk-based approach |

| NIST AI RMF | US NIST | Voluntary guidance, risk management |

| ISO 42001 | ISO/IEC | Certifiable AI management system |

| G7 AI Principles | G7 Nations | International cooperation, non-binding |

| UNESCO AI Ethics Recommendation | UNESCO | Human rights-centered, global scope |

EU AI Act: Risk Classification System

The EU AI Act, which entered into force in 2024, is the world's first comprehensive AI legislation. It adopts a risk-based approach, classifying AI systems into four risk tiers.

Risk Tier Classification

**1. Unacceptable Risk — Prohibited**

- Real-time remote biometric identification in public spaces (e.g., CCTV facial recognition)

- Social scoring systems

- Subliminal manipulation techniques targeting vulnerable groups

- Predictive policing at the individual level

**2. High-Risk — Strict Obligations**

- Medical diagnosis assistance and medical device software

- Autonomous vehicles and critical infrastructure

- Recruitment and personnel evaluation systems

- Credit scoring and insurance underwriting

- Judiciary and law enforcement support tools

- Educational assessment systems

**3. Limited Risk — Transparency Obligations**

- Chatbots: must disclose that the user is interacting with AI

- Deepfake content: must be labeled as synthetic

- Emotion recognition systems: must disclose usage

**4. Minimal Risk — Self-Regulation**

- Spam filters, game AI

- AI-based inventory management, etc.

EU AI Act Risk Classifier Implementation

from dataclasses import dataclass

from enum import Enum

from typing import List

class RiskLevel(Enum):

UNACCEPTABLE = "Unacceptable (Prohibited)"

HIGH = "High-Risk (Strict Regulation)"

LIMITED = "Limited Risk (Transparency Obligations)"

MINIMAL = "Minimal Risk (Self-Regulation)"

@dataclass

class AISystemProfile:

name: str

uses_biometric: bool

is_real_time: bool

public_space: bool

domain: str # healthcare, hiring, credit, education, judiciary, infrastructure

interacts_with_humans: bool

generates_synthetic_content: bool

def classify_eu_ai_act_risk(system: AISystemProfile) -> tuple[RiskLevel, List[str]]:

"""

EU AI Act risk classifier.

Returns (RiskLevel, list_of_applicable_obligations)

"""

obligations = []

Step 1: Check for unacceptable risk

if (system.uses_biometric and system.is_real_time and system.public_space):

return RiskLevel.UNACCEPTABLE, ["Cease operations immediately", "Legally prohibited"]

Step 2: Check for high-risk domains

HIGH_RISK_DOMAINS = {

"healthcare", "hiring", "credit",

"education", "judiciary", "critical_infrastructure"

}

if system.domain in HIGH_RISK_DOMAINS:

obligations = [

"Mandatory Conformity Assessment",

"Technical documentation obligation",

"Human oversight mechanisms required",

"Transparency and logging requirements",

"Bias testing and data governance",

"Registration in EU database",

]

return RiskLevel.HIGH, obligations

Step 3: Limited risk

if system.interacts_with_humans or system.generates_synthetic_content:

obligations = [

"Disclose AI system status to users",

"Watermark or label synthetic content",

]

return RiskLevel.LIMITED, obligations

Step 4: Minimal risk

return RiskLevel.MINIMAL, ["Voluntary code of conduct recommended"]

Usage example

credit_scoring_system = AISystemProfile(

name="Automated Credit Scoring AI",

uses_biometric=False,

is_real_time=False,

public_space=False,

domain="credit",

interacts_with_humans=False,

generates_synthetic_content=False,

)

risk_level, obligations = classify_eu_ai_act_risk(credit_scoring_system)

print(f"System: {credit_scoring_system.name}")

print(f"Risk Level: {risk_level.value}")

print("Obligations:")

for ob in obligations:

print(f" - {ob}")

NIST AI RMF & ISO 42001

NIST AI Risk Management Framework

The NIST AI RMF (2023) is structured around four core functions:

- **GOVERN**: Establish AI risk management culture and policies

- **MAP**: Identify and categorize AI risk context

- **MEASURE**: Analyze, evaluate, and quantify risks

- **MANAGE**: Respond to risks based on priority

ISO/IEC 42001: AI Management System

ISO 42001 is a management system standard for organizations to develop and deploy AI responsibly. Like ISO 9001 (quality) or ISO 27001 (security), it can be certified by third parties.

Core requirements:

- Establish AI policies and objectives

- Clarify leadership responsibilities

- Assess risks and opportunities

- Conduct AI impact assessments

- Perform internal audits and continuous improvement

Responsible AI Development Principles

The FATE Framework

**Fairness**: Treat similar people similarly. Do not disadvantage particular groups.

**Accountability**: Clarify responsibility for decisions. "Who is accountable for this decision?"

**Transparency**: Disclose how AI systems work, what data they were trained on, and their limitations.

**Explainability**: Explain the reasoning behind individual predictions in human-understandable terms.

G7 Hiroshima AI Principles (2023)

1. Rule of law and respect for human rights

2. Transparency and explainability

3. Fairness and non-discrimination

4. Human oversight and control

5. Privacy protection

6. Cybersecurity

7. Information sharing and incident reporting

Bias Detection & Mitigation

AI model bias originates from historical inequalities in training data, feature selection errors, labeling mistakes, and feedback loops.

Key Fairness Metrics

**Demographic Parity (Statistical Parity)**:

The positive prediction rate must be equal across protected groups.

P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1)

**Equal Opportunity**:

The true positive rate (TPR) must be equal across protected groups.

P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1)

**Calibration**:

Predicted probabilities must match actual positive rates (per group).

**Individual Fairness**:

Similar individuals should be treated similarly.

Bias Detection with AIF360

from aif360.datasets import BinaryLabelDataset

from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric

from aif360.algorithms.preprocessing import Reweighing

from sklearn.linear_model import LogisticRegression

from sklearn.preprocessing import StandardScaler

1. Prepare data (loan approval scenario)

np.random.seed(42)

n = 1000

data = pd.DataFrame({

'income': np.random.normal(50000, 20000, n).clip(10000, 150000),

'credit_score': np.random.normal(680, 100, n).clip(300, 850),

'age': np.random.randint(20, 70, n),

'gender': np.random.choice([0, 1], n, p=[0.5, 0.5]), # 0=female, 1=male

'loan_approved': np.zeros(n, dtype=int)

})

Inject artificial bias: males have higher approval probability

prob = 0.3 + 0.2 * data['gender'] + 0.3 * (data['credit_score'] > 700).astype(int)

data['loan_approved'] = (np.random.random(n) < prob).astype(int)

2. Create AIF360 dataset

aif_dataset = BinaryLabelDataset(

df=data,

label_names=['loan_approved'],

protected_attribute_names=['gender'],

favorable_label=1,

unfavorable_label=0,

)

3. Measure bias

privileged_groups = [{'gender': 1}] # male

unprivileged_groups = [{'gender': 0}] # female

dataset_metric = BinaryLabelDatasetMetric(

aif_dataset,

unprivileged_groups=unprivileged_groups,

privileged_groups=privileged_groups,

)

print("=== Original Data Bias Analysis ===")

print(f"Disparate Impact: {dataset_metric.disparate_impact():.4f}")

print(f"Statistical Parity Difference: {dataset_metric.statistical_parity_difference():.4f}")

Disparate Impact < 0.8 → 80% rule violation (bias detected)

4. Preprocessing bias mitigation with Reweighing

rw = Reweighing(

unprivileged_groups=unprivileged_groups,

privileged_groups=privileged_groups,

)

dataset_reweighed = rw.fit_transform(aif_dataset)

metric_reweighed = BinaryLabelDatasetMetric(

dataset_reweighed,

unprivileged_groups=unprivileged_groups,

privileged_groups=privileged_groups,

)

print("\n=== After Reweighing ===")

print(f"Disparate Impact: {metric_reweighed.disparate_impact():.4f}")

print(f"Statistical Parity Difference: {metric_reweighed.statistical_parity_difference():.4f}")

Post-processing Mitigation with Fairlearn

from fairlearn.postprocessing import ThresholdOptimizer

from fairlearn.metrics import MetricFrame, selection_rate, demographic_parity_difference

from sklearn.ensemble import GradientBoostingClassifier

Train model

X = data[['income', 'credit_score', 'age']].values

y = data['loan_approved'].values

sensitive = data['gender'].values

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

base_model = GradientBoostingClassifier(n_estimators=100, random_state=42)

base_model.fit(X_scaled, y)

ThresholdOptimizer: optimize decision thresholds per group

postprocess_est = ThresholdOptimizer(

estimator=base_model,

constraints="demographic_parity",

predict_method="predict_proba",

objective="balanced_accuracy_score",

)

postprocess_est.fit(X_scaled, y, sensitive_features=sensitive)

y_pred_fair = postprocess_est.predict(X_scaled, sensitive_features=sensitive)

Measure fairness metrics

mf = MetricFrame(

metrics={"selection_rate": selection_rate},

y_true=y,

y_pred=y_pred_fair,

sensitive_features=sensitive,

)

print("\n=== Fairlearn Post-processing Results ===")

print(f"Selection rate by group:\n{mf.by_group}")

print(f"Demographic Parity Difference: {demographic_parity_difference(y, y_pred_fair, sensitive_features=sensitive):.4f}")

Explainable AI (XAI)

SHAP: SHapley Additive exPlanations

SHAP leverages Shapley values from cooperative game theory to quantify each feature's contribution to a prediction. It computes the average marginal contribution of a feature across all possible feature subsets.

from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import make_classification

Train model

X_train, y_train = make_classification(

n_samples=500, n_features=8, n_informative=5, random_state=42

)

feature_names = [

'income', 'credit_score', 'age', 'debt_ratio',

'employment_years', 'num_accounts', 'late_payments', 'loan_amount'

]

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

rf_model.fit(X_train, y_train)

SHAP TreeExplainer (tree-specific, fast)

explainer = shap.TreeExplainer(rf_model)

shap_values = explainer.shap_values(X_train)

Individual prediction explanation (Waterfall Plot)

sample_idx = 0

shap.waterfall_plot(

shap.Explanation(

values=shap_values[1][sample_idx],

base_values=explainer.expected_value[1],

data=X_train[sample_idx],

feature_names=feature_names,

)

Global importance (Summary Plot)

shap.summary_plot(shap_values[1], X_train, feature_names=feature_names)

SHAP interaction effects

shap_interaction = explainer.shap_interaction_values(X_train[:100])

print(f"Income-CreditScore interaction SHAP: {shap_interaction[0, 0, 1]:.4f}")

LIME: Local Interpretable Model-agnostic Explanations

Create LIME explainer

lime_explainer = lime.lime_tabular.LimeTabularExplainer(

training_data=X_train,

feature_names=feature_names,

class_names=['Rejected', 'Approved'],

mode='classification',

discretize_continuous=True,

)

Explain individual sample

explanation = lime_explainer.explain_instance(

data_row=X_train[0],

predict_fn=rf_model.predict_proba,

num_features=6,

num_samples=1000,

)

print("=== LIME Explanation (Sample #0) ===")

for feature, weight in explanation.as_list():

direction = "increases" if weight > 0 else "decreases"

print(f" {feature}: {weight:+.4f} ({direction} approval probability)")

explanation.show_in_notebook(show_table=True)

Generating a Model Card

from datetime import datetime

def generate_model_card(

model_name: str,

version: str,

intended_use: str,

out_of_scope_uses: list,

training_data: dict,

evaluation_results: dict,

fairness_analysis: dict,

limitations: list,

ethical_considerations: list,

) -> dict:

"""Standard model card generator (based on Mitchell et al. 2019)."""

model_card = {

"model_details": {

"name": model_name,

"version": version,

"date": datetime.now().strftime("%Y-%m-%d"),

"type": "Binary Classifier",

"paper": "https://arxiv.org/abs/1810.03993",

"intended_use": {

"primary_uses": intended_use,

"primary_users": ["Credit officers", "Financial regulators"],

"out_of_scope_uses": out_of_scope_uses,

"factors": {

"relevant_factors": ["gender", "age_group", "income_bracket"],

"evaluation_factors": ["demographic_parity", "equal_opportunity"],

"metrics": {

"performance_measures": evaluation_results,

"decision_thresholds": {"default": 0.5, "high_precision": 0.7},

"training_data": training_data,

"fairness_analysis": fairness_analysis,

"limitations": limitations,

"ethical_considerations": ethical_considerations,

"caveats_recommendations": [

"Regular drift monitoring recommended",

"Quarterly bias re-evaluation required",

"Human review required for high-stakes decisions",

}

return model_card

card = generate_model_card(

model_name="Personal Loan Approval Model v2.1",

version="2.1.0",

intended_use="Automated initial screening for personal loan applications",

out_of_scope_uses=["Corporate loan assessment", "Insurance pricing", "Employment decisions"],

training_data={"size": 50000, "period": "2020-2024", "source": "Internal loan history"},

evaluation_results={"accuracy": 0.84, "AUC": 0.91, "F1": 0.82},

fairness_analysis={

"demographic_parity_diff": 0.03,

"equal_opportunity_diff": 0.02,

"disparate_impact": 0.96,

limitations=["Pre-2020 data not included", "Rural region underrepresentation"],

ethical_considerations=["Final decisions must be reviewed by human officers", "Mandatory disclosure of rejection reasons"],

)

print(json.dumps(card, indent=2))

AI Safety Techniques

Constitutional AI (Anthropic)

Constitutional AI trains models to critique and revise their own responses according to a set of explicit principles (the "constitution").

How it works:

1. Model generates a potentially harmful response

2. Model performs self-critique based on constitutional principles

3. Model revises the response to comply with principles

4. Revised responses are used to train via RLHF

RLHF (Reinforcement Learning from Human Feedback)

1. SFT (Supervised Fine-Tuning): Fine-tune base model on high-quality demonstration data

2. Reward Modeling: Train reward model on human preference pairs (preferred vs. rejected)

3. RL Optimization: Maximize reward with PPO algorithm (with KL divergence constraint)

Jailbreak Defense Techniques

- **Input filtering**: Detect and block harmful patterns before processing

- **Prompt injection defense**: Isolate system prompts from user inputs

- **Output monitoring**: Real-time safety checks on generated text

- **Red teaming**: Expert adversarial teams systematically probe for vulnerabilities

AI Watermarking

Text watermarking inserts statistically detectable patterns into LLM-generated text.

def green_red_watermark(text: str, key: str, gamma: float = 0.25) -> dict:

"""

Green/red list watermarking based on Kirchenbauer et al. (2023).

Uses the previous token as a seed to classify tokens as green or red,

preferring green tokens during generation to embed a watermark.

"""

words = text.split()

green_count = 0

total = len(words)

for i, word in enumerate(words):

prev_token = words[i - 1] if i > 0 else "<s>"

seed = int(hashlib.sha256(f"{key}{prev_token}".encode()).hexdigest(), 16) % (2**32)

random.seed(seed)

is_green = random.random() > (1 - gamma)

if is_green:

green_count += 1

z_score = (green_count - gamma * total) / ((gamma * (1 - gamma) * total) ** 0.5 + 1e-9)

return {

"green_token_ratio": green_count / total,

"z_score": z_score,

"is_watermarked": z_score > 4.0,

}

Data Privacy Technologies

Differential Privacy

Differential privacy adds noise to databases to statistically conceal whether any individual record is included. Smaller epsilon values provide stronger privacy guarantees.

from opacus import PrivacyEngine

from torch.utils.data import DataLoader, TensorDataset

Define model

class SimpleNet(nn.Module):

def __init__(self):

super().__init__()

self.fc = nn.Sequential(

nn.Linear(10, 64),

nn.ReLU(),

nn.Linear(64, 2),

)

def forward(self, x):

return self.fc(x)

Synthetic data

X = torch.randn(1000, 10)

y = torch.randint(0, 2, (1000,))

dataset = TensorDataset(X, y)

loader = DataLoader(dataset, batch_size=64, shuffle=True)

model = SimpleNet()

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

Apply Opacus PrivacyEngine

privacy_engine = PrivacyEngine()

model, optimizer, loader = privacy_engine.make_private_with_epsilon(

module=model,

optimizer=optimizer,

data_loader=loader,

target_epsilon=1.0, # epsilon: smaller = stronger privacy

target_delta=1e-5, # delta: probability of epsilon violation

max_grad_norm=1.0, # gradient clipping threshold

epochs=10,

)

Training loop

criterion = nn.CrossEntropyLoss()

for epoch in range(3):

for batch_X, batch_y in loader:

optimizer.zero_grad()

outputs = model(batch_X)

loss = criterion(outputs, batch_y)

loss.backward()

optimizer.step()

epsilon = privacy_engine.get_epsilon(delta=1e-5)

print(f"Training complete: epsilon = {epsilon:.2f}, delta = 1e-5")

print(f"Privacy guarantee: individual data contribution bounded by e^{epsilon:.2f}")

Federated Learning

Federated learning avoids sending raw data to a central server — clients share only locally trained model weights (gradients).

def federated_averaging(global_model_weights, client_updates, client_data_sizes):

"""

FedAvg algorithm: weighted average aggregation based on data sizes.

"""

total_data = sum(client_data_sizes)

averaged_weights = {}

for key in global_model_weights.keys():

weighted_sum = sum(

client_updates[i][key] * (client_data_sizes[i] / total_data)

for i in range(len(client_updates))

)

averaged_weights[key] = weighted_sum

return averaged_weights

GDPR AI compliance checklist

gdpr_ai_checklist = {

"Data Minimization": "Collect only the minimum data necessary for model training",

"Purpose Limitation": "Prohibit use of training data beyond its stated purpose",

"Data Subject Rights": "Guarantee the right to explanation for automated decisions (Article 22)",

"Profiling Restrictions": "Human review required for significant automated profiling decisions",

"Data Portability": "Right to receive personal data in a portable format",

"Right to Erasure": "Remove the influence of personal data from models (Machine Unlearning)",

}

for right, description in gdpr_ai_checklist.items():

print(f"[GDPR] {right}: {description}")

AI Regulatory Practice

Model Audit Process

1. **Define audit scope**: Clarify the model, time period, and use case under review

2. **Document review**: Examine training data provenance, model cards, system cards

3. **Technical testing**: Bias measurement, robustness testing, adversarial attack simulation

4. **Stakeholder interviews**: Operations team, affected group representatives, regulators

5. **Audit report**: Document findings, risk ratings, and recommended actions

Composing an AI Ethics Committee

An effective AI ethics committee should include:

| Role | Required Competency |

| ----------------------------- | --------------------------------- |

| AI/ML technical expert | Understand how models work |

| Legal/compliance officer | Interpret regulatory requirements |

| Ethicist/philosopher | Mediate value conflicts |

| Domain expert | Provide application context |

| Affected group representative | Reflect real-world impacts |

| Cybersecurity expert | Assess security risks |

Risk Register Template

from dataclasses import dataclass, field

from typing import List

from enum import IntEnum

class Severity(IntEnum):

LOW = 1

MEDIUM = 2

HIGH = 3

CRITICAL = 4

class Likelihood(IntEnum):

RARE = 1

UNLIKELY = 2

POSSIBLE = 3

LIKELY = 4

@dataclass

class AIRisk:

risk_id: str

description: str

severity: Severity

likelihood: Likelihood

affected_groups: List[str]

mitigation: str

owner: str

residual_risk: str = "TBD"

@property

def risk_score(self) -> int:

return self.severity * self.likelihood

@property

def risk_level(self) -> str:

score = self.risk_score

if score >= 12:

return "CRITICAL"

elif score >= 8:

return "HIGH"

elif score >= 4:

return "MEDIUM"

return "LOW"

Example risk register

risks = [

AIRisk(

risk_id="RISK-001",

description="Gender bias in credit model leading to discriminatory loan rejections",

severity=Severity.HIGH,

likelihood=Likelihood.POSSIBLE,

affected_groups=["Women", "Non-binary individuals"],

mitigation="Reweighing + quarterly disparate impact monitoring",

owner="AI Ethics Team",

AIRisk(

risk_id="RISK-002",

description="Inability to explain model decisions violating GDPR Article 22",

severity=Severity.CRITICAL,

likelihood=Likelihood.LIKELY,

affected_groups=["All loan applicants"],

mitigation="Build SHAP-based decision explanation system",

owner="Compliance Team",

]

print("=== AI Risk Register ===")

for risk in risks:

print(f"\n[{risk.risk_id}] {risk.description}")

print(f" Risk Level: {risk.risk_level} (Score: {risk.risk_score})")

print(f" Mitigation: {risk.mitigation}")

Quiz

**Answer**: Real-time + public space + remote biometric identification — when all three conditions are met simultaneously, the system falls under Unacceptable Risk and is prohibited. Limited exceptions exist, such as law enforcement searching for missing children. Non-real-time or post-hoc biometric analysis, or biometric systems used in judiciary and border control contexts, are classified as High-Risk and subject to strict obligations including conformity assessments.

**Explanation**: EU AI Act Annex III explicitly lists remote biometric identification systems used in law enforcement, judiciary, and border management as high-risk AI. Real-time remote biometric identification in public spaces is principally prohibited under Article 5.

**Answer**: Demographic parity requires equal positive prediction rates across protected groups: P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1). Equal opportunity requires equal true positive rates (TPR) for positive-outcome individuals: P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1).

**Explanation**: Chouldechova's (2017) impossibility theorem shows that when base rates differ across groups, it is mathematically impossible to simultaneously satisfy demographic parity, equal opportunity, and predictive parity. Organizations must explicitly choose which fairness criterion to prioritize based on application context and the nature of potential harms.

**Answer**: SHAP is based on Shapley values from cooperative game theory. Each feature is treated as a "player" and the model's prediction as the "payoff." It computes the average marginal contribution of each feature across all possible feature subsets (coalitions).

**Explanation**: Shapley values are the unique attribution method satisfying four axioms: efficiency (SHAP values sum to prediction minus expected value), symmetry, linearity, and the dummy feature property. Unlike LIME, SHAP guarantees global consistency. TreeSHAP computes values in O(TLD^2) time for tree-based models.

**Answer**: Epsilon defines an upper bound: "Including or excluding one data point can change the output distribution by at most e^epsilon." As epsilon approaches 0, the output distribution becomes nearly identical regardless of whether any individual record is included, preventing individual information from being inferred.

**Explanation**: Epsilon = 0 means perfect privacy (fully random output); large epsilon is practical but offers weaker protection. In practice, epsilon below 1 is considered strong privacy, and below 10 is considered practical privacy. Libraries like Opacus (PyTorch) and TensorFlow Privacy automatically compute the required noise scale and track epsilon.

**Answer**: Model details (name, version, type), intended use and out-of-scope uses, evaluation factors (protected attributes), performance metrics (accuracy, AUC, etc.), training data description, fairness analysis results, limitations and caveats, and ethical considerations.

**Explanation**: Model cards, proposed by Mitchell et al. (2019), have become a transparency standard. Major organizations including Google and Hugging Face publish model cards with model releases. EU AI Act high-risk AI requires technical documentation under Annex IV that is substantially equivalent to a model card.

Table of Contents

AI Governance Frameworks Overview

EU AI Act: Risk Classification System

Risk Tier Classification

EU AI Act Risk Classifier Implementation

Step 1: Check for unacceptable risk

Step 2: Check for high-risk domains

Step 3: Limited risk

Step 4: Minimal risk

Usage example

NIST AI RMF & ISO 42001

NIST AI Risk Management Framework

ISO/IEC 42001: AI Management System

Responsible AI Development Principles

The FATE Framework

G7 Hiroshima AI Principles (2023)

Bias Detection & Mitigation

Key Fairness Metrics

Bias Detection with AIF360

1. Prepare data (loan approval scenario)

Inject artificial bias: males have higher approval probability

2. Create AIF360 dataset

3. Measure bias

Disparate Impact < 0.8 → 80% rule violation (bias detected)

4. Preprocessing bias mitigation with Reweighing

Post-processing Mitigation with Fairlearn

Train model

ThresholdOptimizer: optimize decision thresholds per group

Measure fairness metrics

Explainable AI (XAI)

SHAP: SHapley Additive exPlanations

Train model

SHAP TreeExplainer (tree-specific, fast)

Individual prediction explanation (Waterfall Plot)

Global importance (Summary Plot)

SHAP interaction effects

LIME: Local Interpretable Model-agnostic Explanations

Create LIME explainer

Explain individual sample

Generating a Model Card

AI Safety Techniques

Constitutional AI (Anthropic)

RLHF (Reinforcement Learning from Human Feedback)

Jailbreak Defense Techniques

AI Watermarking

Data Privacy Technologies

Differential Privacy

Define model

Synthetic data

Apply Opacus PrivacyEngine

Training loop

Federated Learning

GDPR AI compliance checklist

AI Regulatory Practice

Model Audit Process

Composing an AI Ethics Committee

Risk Register Template

Example risk register

Quiz