
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>Chaos and Order</title>
      <link>https://www.youngju.dev/blog</link>
      <description>천천히 올바르게. AI Researcher &amp; DevOps Engineer Youngju&#39;s tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering.</description>
      <language>ko</language>
      <managingEditor>fjvbn2003@gmail.com (Youngju Kim)</managingEditor>
      <webMaster>fjvbn2003@gmail.com (Youngju Kim)</webMaster>
      <lastBuildDate>Sat, 16 May 2026 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://www.youngju.dev/tags/owasp-llm-top-10/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-ai-safety-evals-red-teaming-2026-inspect-ai-garak-pyrit-promptfoo-openai-evals-deep-dive.en</guid>
    <title>AI Safety, Evals and Red-Teaming in 2026 — Deep Dive into Inspect AI, Garak, PyRIT, Promptfoo, OpenAI Evals, lm-eval-harness</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-ai-safety-evals-red-teaming-2026-inspect-ai-garak-pyrit-promptfoo-openai-evals-deep-dive.en</link>
    <description>A single-page map of the 2026 AI safety, evaluation, and red-teaming ecosystem. Inspect AI (Anthropic, adopted by UK AISI), Garak (NVIDIA then independent), PyRIT (Microsoft), Promptfoo (YC), OpenAI Evals, lm-evaluation-harness (EleutherAI), plus MLflow Evals, Arize Phoenix, DeepEval (Confident AI), Giskard, Atla. Benchmark batteries (HumanEval, MMLU, GPQA, SWE-Bench, BigCodeBench), policy-side OpenAI Preparedness Framework and Anthropic RSP, standards-side MITRE ATLAS and OWASP LLM Top 10, AI Safety Institutes (UK, US, Japan, Korea, Singapore, France). Korean KAIST/KISTI and Japanese AISI/RIKEN AIP. Who should pick what, broken down into model release, app integration, governance, and academic personas.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>ai-safety</category><category>red-teaming</category><category>evals</category><category>inspect-ai</category><category>garak</category><category>pyrit</category><category>promptfoo</category><category>openai-evals</category><category>lm-evaluation-harness</category><category>deepeval</category><category>phoenix</category><category>giskard</category><category>ai-safety-institute</category><category>aisi</category><category>rsp</category><category>mitre-atlas</category><category>owasp-llm-top-10</category><category>2026</category><category>deep-dive</category><category>english</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-ai-safety-evals-red-teaming-2026-inspect-ai-garak-pyrit-promptfoo-openai-evals-deep-dive.ja</guid>
    <title>AI 安全 / 評価 / レッドチーミング 2026 — Inspect AI / Garak / PyRIT / Promptfoo / OpenAI Evals / lm-eval-harness 深掘りガイド</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-ai-safety-evals-red-teaming-2026-inspect-ai-garak-pyrit-promptfoo-openai-evals-deep-dive.ja</link>
    <description>2026年のAI安全・評価・レッドチーミング エコシステムを1枚にまとめる。Inspect AI(Anthropic、UK AISI採用)、Garak(NVIDIA→独立)、PyRIT(Microsoft)、Promptfoo(YC)、OpenAI Evals、lm-evaluation-harness(EleutherAI)、そして MLflow Evals・Arize Phoenix・DeepEval(Confident AI)・Giskard・Atla。ベンチマーク群(HumanEval / MMLU / GPQA / SWE-Bench / BigCodeBench)、ポリシー側の OpenAI Preparedness Framework と Anthropic RSP、標準側の MITRE ATLAS と OWASP LLM Top 10、AI Safety Institute(英・米・日・韓・シンガポール・仏)。韓国の KAIST / KISTI、日本の AISI / RIKEN AIP まで。誰が何を選ぶべきかをモデルリリース・アプリ統合・ガバナンス・学術の4軸で整理する。</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>ai-safety</category><category>red-teaming</category><category>evals</category><category>inspect-ai</category><category>garak</category><category>pyrit</category><category>promptfoo</category><category>openai-evals</category><category>lm-evaluation-harness</category><category>deepeval</category><category>phoenix</category><category>giskard</category><category>ai-safety-institute</category><category>aisi</category><category>rsp</category><category>mitre-atlas</category><category>owasp-llm-top-10</category><category>2026</category><category>deep-dive</category><category>日本語</category>
  </item>

  <item>
    <guid>https://www.youngju.dev/blog/culture/2026-05-16-ai-safety-evals-red-teaming-2026-inspect-ai-garak-pyrit-promptfoo-openai-evals-deep-dive</guid>
    <title>AI 안전 / 평가 / 레드티밍 2026 — Inspect AI / Garak / PyRIT / Promptfoo / OpenAI Evals / lm-eval-harness 심층 가이드</title>
    <link>https://www.youngju.dev/blog/culture/2026-05-16-ai-safety-evals-red-teaming-2026-inspect-ai-garak-pyrit-promptfoo-openai-evals-deep-dive</link>
    <description>2026년의 AI 안전·평가·레드티밍 생태계를 한 장에 모은다. Inspect AI(Anthropic, UK AISI 채택)·Garak(NVIDIA→독립)·PyRIT(Microsoft)·Promptfoo(YC)·OpenAI Evals·lm-evaluation-harness(EleutherAI), 그리고 MLflow Evals·Arize Phoenix·DeepEval(Confident AI)·Giskard·Atla. 벤치마크 배터리(HumanEval·MMLU·GPQA·SWE-Bench·BigCodeBench), 정책 측의 OpenAI Preparedness Framework와 Anthropic RSP, 표준 측의 MITRE ATLAS와 OWASP LLM Top 10, 그리고 AI Safety Institute(UK·US·일본·한국·싱가포르·프랑스). 한국 KAIST·KISTI, 일본 AISI·RIKEN AIP까지. 누가 무엇을 골라야 하는지를 모델 출시·앱 통합·거버넌스·학술 네 갈래로 정리.</description>
    <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
    <author>fjvbn2003@gmail.com (Youngju Kim)</author>
    <category>ai-safety</category><category>red-teaming</category><category>evals</category><category>inspect-ai</category><category>garak</category><category>pyrit</category><category>promptfoo</category><category>openai-evals</category><category>lm-evaluation-harness</category><category>deepeval</category><category>phoenix</category><category>giskard</category><category>ai-safety-institute</category><category>aisi</category><category>rsp</category><category>mitre-atlas</category><category>owasp-llm-top-10</category><category>2026</category><category>deep-dive</category>
  </item>

    </channel>
  </rss>
