Split View: LLM 프롬프트 엔지니어링 고급 기법: Chain-of-Thought·Tree-of-Thought·ReAct·Few-Shot 패턴 실전 가이드

LLM 프롬프트 엔지니어링 고급 기법: Chain-of-Thought·Tree-of-Thought·ReAct·Few-Shot 패턴 실전 가이드

들어가며
프롬프팅 기법 분류 체계
Zero-shot과 Few-shot 프롬프팅
- Zero-shot 프롬프팅
- Few-shot 프롬프팅
Chain-of-Thought (CoT) 프롬프팅
- 핵심 원리
- Zero-shot CoT
Self-Consistency 디코딩
Tree-of-Thought (ToT) 프레임워크
- 핵심 아이디어
ReAct: 추론과 행동의 결합
- 핵심 원리
구조화된 출력 프롬프팅
프롬프트 체이닝
프롬프팅 기법 성능 비교
- 벤치마크 결과
- 기법 선택 가이드
일반적인 안티패턴
프로덕션 최적화
운영 시 주의사항
마치며
참고자료

LLM Prompt Engineering Advanced Techniques

들어가며

프롬프트 엔지니어링은 LLM의 잠재 능력을 최대한 끌어내는 핵심 기술이다. 2022년 Wei 등이 발표한 Chain-of-Thought 논문은 "프롬프트에 추론 과정을 포함하면 모델의 추론 능력이 비약적으로 향상된다"는 것을 증명하며, 프롬프트 엔지니어링을 하나의 독립된 연구 분야로 확립시켰다.

이후 Self-Consistency, Tree-of-Thought, ReAct 등의 고급 기법이 연이어 등장하며, 단순한 질문-답변 패턴을 넘어 복잡한 추론, 계획, 외부 도구 활용까지 LLM의 활용 범위를 크게 넓혔다. 특히 ReAct 패턴은 현재 대부분의 AI 에이전트 프레임워크(LangChain, AutoGen 등)의 핵심 아키텍처로 자리 잡았다.

이 글에서는 각 프롬프팅 기법의 이론적 배경, 논문 핵심 발견, Python 구현 코드, 성능 비교, 안티패턴, 프로덕션 최적화 전략을 체계적으로 다룬다.

프롬프팅 기법 분류 체계

프롬프팅 기법은 다음과 같이 분류할 수 있다.

분류	기법	핵심 아이디어	논문
기본	Zero-shot	예시 없이 지시만으로 수행	-
기본	Few-shot	소수 예시 제공	Brown et al. 2020
추론 강화	Chain-of-Thought	중간 추론 단계 생성	Wei et al. 2022
추론 강화	Zero-shot CoT	"단계별로 생각하자" 한 문장 추가	Kojima et al. 2022
앙상블	Self-Consistency	다수 경로 샘플링 + 다수결	Wang et al. 2022
탐색	Tree-of-Thought	트리 구조 추론 경로 탐색	Yao et al. 2023
에이전트	ReAct	추론 + 행동 + 관찰 루프	Yao et al. 2022
구조화	Structured Output	JSON/XML 형식 강제 출력	-
조합	Prompt Chaining	작업 분해 + 순차 실행	-

Zero-shot과 Few-shot 프롬프팅

Zero-shot 프롬프팅

예시 없이 지시문만으로 모델에게 작업을 수행시키는 가장 기본적인 방식이다. 최근 대규모 모델(GPT-4, Claude 3.5 등)의 성능 향상으로 많은 작업에서 Zero-shot만으로도 충분한 성능을 달성할 수 있게 되었다.

from openai import OpenAI

client = OpenAI()

def zero_shot_classification(text: str) -> str:
    """Zero-shot 텍스트 분류"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a text classifier. "
                    "Classify the given text into one of the following categories: "
                    "Technology, Business, Science, Sports, Entertainment. "
                    "Respond with only the category name."
                )
            },
            {"role": "user", "content": text}
        ],
        temperature=0,
        max_tokens=20,
    )
    return response.choices[0].message.content.strip()

Few-shot 프롬프팅

Few-shot 프롬프팅은 프롬프트에 소수의 입출력 예시를 포함하여 모델이 패턴을 학습하도록 유도한다. Brown 등(2020)의 GPT-3 논문에서 체계적으로 제시되었으며, 특히 일관된 형식의 출력이 필요한 작업에서 효과적이다.

def few_shot_entity_extraction(text: str) -> str:
    """Few-shot 개체명 추출"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "Extract named entities from the given text in the specified format."
            },
            {
                "role": "user",
                "content": "Samsung Electronics announced the Galaxy S25 series at CES 2025 in Las Vegas."
            },
            {
                "role": "assistant",
                "content": (
                    "- Organization: Samsung Electronics\n"
                    "- Product: Galaxy S25\n"
                    "- Event: CES 2025\n"
                    "- Location: Las Vegas"
                )
            },
            {
                "role": "user",
                "content": "Elon Musk revealed that Tesla will open a new Gigafactory in Austin, Texas in March 2026."
            },
            {
                "role": "assistant",
                "content": (
                    "- Person: Elon Musk\n"
                    "- Organization: Tesla\n"
                    "- Facility: Gigafactory\n"
                    "- Location: Austin, Texas\n"
                    "- Date: March 2026"
                )
            },
            {"role": "user", "content": text}
        ],
        temperature=0,
    )
    return response.choices[0].message.content

# Few-shot 예시 선택 전략
class FewShotSelector:
    """동적 Few-shot 예시 선택기"""
    def __init__(self, examples, embedding_model="text-embedding-3-small"):
        self.examples = examples
        self.client = OpenAI()
        self.embedding_model = embedding_model
        self._precompute_embeddings()

    def _precompute_embeddings(self):
        """모든 예시의 임베딩 사전 계산"""
        texts = [ex["input"] for ex in self.examples]
        response = self.client.embeddings.create(
            model=self.embedding_model,
            input=texts
        )
        self.embeddings = [r.embedding for r in response.data]

    def select(self, query: str, k: int = 3) -> list:
        """쿼리와 가장 유사한 k개 예시 선택"""
        query_emb = self.client.embeddings.create(
            model=self.embedding_model,
            input=[query]
        ).data[0].embedding

        # 코사인 유사도 계산
        import numpy as np
        similarities = []
        for emb in self.embeddings:
            sim = np.dot(query_emb, emb) / (
                np.linalg.norm(query_emb) * np.linalg.norm(emb)
            )
            similarities.append(sim)

        # 상위 k개 선택
        top_indices = np.argsort(similarities)[-k:][::-1]
        return [self.examples[i] for i in top_indices]

Chain-of-Thought (CoT) 프롬프팅

핵심 원리

Wei 등(2022)의 Chain-of-Thought 논문은 프롬프트에 중간 추론 단계를 포함시키면 LLM의 복잡한 추론 능력이 비약적으로 향상된다는 것을 증명했다. 핵심 발견은 다음과 같다.

540B 파라미터 PaLM 모델에서 GSM8K(수학 문제) 정확도가 17.9%에서 **58.1%**로 향상
CoT는 모델 크기가 충분히 클 때(100B+) 효과가 나타나는 창발적 능력(emergent ability)
8개의 CoT 예시만으로도 GPT-3(파인튜닝) + 검증기 성능을 초과

def chain_of_thought_math(problem: str) -> dict:
    """Chain-of-Thought 수학 문제 풀이"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a math tutor. Solve problems step by step.\n"
                    "Show your reasoning clearly before giving the final answer.\n"
                    "Format: Step 1, Step 2, ..., Final Answer: [number]"
                )
            },
            {
                "role": "user",
                "content": (
                    "A store has 45 apples. They sell 1/3 of them in the morning "
                    "and 2/5 of the remaining in the afternoon. "
                    "How many apples are left?"
                )
            },
            {
                "role": "assistant",
                "content": (
                    "Step 1: Calculate apples sold in the morning.\n"
                    "Morning sales = 45 * (1/3) = 15 apples\n\n"
                    "Step 2: Calculate remaining apples after morning.\n"
                    "Remaining = 45 - 15 = 30 apples\n\n"
                    "Step 3: Calculate apples sold in the afternoon.\n"
                    "Afternoon sales = 30 * (2/5) = 12 apples\n\n"
                    "Step 4: Calculate final remaining apples.\n"
                    "Final remaining = 30 - 12 = 18 apples\n\n"
                    "Final Answer: 18"
                )
            },
            {"role": "user", "content": problem}
        ],
        temperature=0,
    )

    answer_text = response.choices[0].message.content
    # 최종 답변 추출
    import re
    match = re.search(r"Final Answer:\s*(\d+)", answer_text)
    final_answer = int(match.group(1)) if match else None

    return {
        "reasoning": answer_text,
        "answer": final_answer,
        "tokens_used": response.usage.total_tokens,
    }

Zero-shot CoT

Kojima 등(2022)은 단순히 "Let's think step by step" 이라는 한 문장을 추가하는 것만으로도 CoT 효과를 얻을 수 있음을 발견했다. 이는 별도의 예시 작성이 필요 없어 실용적으로 매우 유용하다.

def zero_shot_cot(problem: str) -> str:
    """Zero-shot Chain-of-Thought"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": f"{problem}\n\nLet's think step by step."
            }
        ],
        temperature=0,
    )
    return response.choices[0].message.content

Self-Consistency 디코딩

Wang 등(2022)의 Self-Consistency는 CoT의 단일 그리디 디코딩 대신 여러 추론 경로를 샘플링하고 다수결로 최종 답변을 결정한다. GSM8K에서 CoT 대비 +17.9% 정확도 향상을 달성했다.

import collections
import re

def self_consistency(problem: str, num_samples: int = 5) -> dict:
    """Self-Consistency 디코딩"""
    answers = []
    reasoning_paths = []

    for i in range(num_samples):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Solve the math problem step by step. "
                        "End with 'Final Answer: [number]'"
                    )
                },
                {"role": "user", "content": problem}
            ],
            temperature=0.7,  # 다양한 추론 경로를 위해 temperature 상향
            max_tokens=500,
        )

        text = response.choices[0].message.content
        reasoning_paths.append(text)

        # 답변 추출
        match = re.search(r"Final Answer:\s*(\d+)", text)
        if match:
            answers.append(int(match.group(1)))

    # 다수결 투표
    if answers:
        counter = collections.Counter(answers)
        majority_answer = counter.most_common(1)[0][0]
        confidence = counter.most_common(1)[0][1] / len(answers)
    else:
        majority_answer = None
        confidence = 0.0

    return {
        "answer": majority_answer,
        "confidence": confidence,
        "all_answers": answers,
        "num_samples": num_samples,
        "answer_distribution": dict(counter) if answers else {},
    }

Tree-of-Thought (ToT) 프레임워크

핵심 아이디어

Yao 등(2023)의 **Tree-of-Thought(ToT)**는 CoT를 트리 구조로 확장하여 여러 추론 경로를 동시에 탐색한다. 핵심 발견은 다음과 같다.

Game of 24 과제: GPT-4 + CoT가 4% 성공률 -> ToT로 74% 달성
BFS/DFS 탐색 전략으로 추론 경로를 체계적으로 탐색
각 경로를 LLM 자체가 평가하여 유망한 경로만 확장

from dataclasses import dataclass
from typing import Optional

@dataclass
class ThoughtNode:
    """ToT의 사고 노드"""
    content: str
    score: float = 0.0
    children: list = None
    parent: Optional['ThoughtNode'] = None
    depth: int = 0

    def __post_init__(self):
        if self.children is None:
            self.children = []

class TreeOfThought:
    """Tree-of-Thought 프레임워크"""
    def __init__(self, model="gpt-4o", max_depth=3, branching_factor=3):
        self.client = OpenAI()
        self.model = model
        self.max_depth = max_depth
        self.branching_factor = branching_factor

    def generate_thoughts(self, problem: str, current_thought: str) -> list:
        """현재 상태에서 가능한 다음 사고 생성"""
        prompt = (
            f"Problem: {problem}\n\n"
            f"Current reasoning so far:\n{current_thought}\n\n"
            f"Generate {self.branching_factor} different possible next steps "
            f"for solving this problem. "
            f"Format each as 'Step N: [reasoning]' separated by '---'"
        )
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.8,
        )
        text = response.choices[0].message.content
        thoughts = [t.strip() for t in text.split("---") if t.strip()]
        return thoughts[:self.branching_factor]

    def evaluate_thought(self, problem: str, thought_path: str) -> float:
        """사고 경로의 유망성을 0-1 사이로 평가"""
        prompt = (
            f"Problem: {problem}\n\n"
            f"Reasoning path:\n{thought_path}\n\n"
            f"Evaluate this reasoning path on a scale of 0.0 to 1.0:\n"
            f"- 1.0: Correct and complete solution\n"
            f"- 0.7-0.9: On the right track, promising\n"
            f"- 0.4-0.6: Partially correct but uncertain\n"
            f"- 0.0-0.3: Wrong approach or contains errors\n\n"
            f"Respond with only the score (e.g., 0.8)"
        )
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0,
            max_tokens=10,
        )
        try:
            score = float(response.choices[0].message.content.strip())
            return min(max(score, 0.0), 1.0)
        except ValueError:
            return 0.5

    def solve_bfs(self, problem: str) -> dict:
        """BFS 기반 ToT 탐색"""
        root = ThoughtNode(content="", depth=0)
        current_level = [root]
        best_solution = None
        best_score = 0.0

        for depth in range(self.max_depth):
            next_level = []
            for node in current_level:
                # 자식 사고 생성
                thought_path = self._get_path(node)
                children_thoughts = self.generate_thoughts(problem, thought_path)

                for thought in children_thoughts:
                    full_path = f"{thought_path}\n{thought}" if thought_path else thought
                    score = self.evaluate_thought(problem, full_path)

                    child = ThoughtNode(
                        content=thought,
                        score=score,
                        parent=node,
                        depth=depth + 1
                    )
                    node.children.append(child)
                    next_level.append(child)

                    if score > best_score:
                        best_score = score
                        best_solution = full_path

            # 상위 branching_factor개만 유지 (빔 서치)
            next_level.sort(key=lambda n: n.score, reverse=True)
            current_level = next_level[:self.branching_factor]

        return {
            "solution": best_solution,
            "score": best_score,
            "depth_explored": self.max_depth,
        }

    def _get_path(self, node: ThoughtNode) -> str:
        """노드까지의 전체 사고 경로 반환"""
        path = []
        current = node
        while current and current.content:
            path.append(current.content)
            current = current.parent
        return "\n".join(reversed(path))

ReAct: 추론과 행동의 결합

핵심 원리

Yao 등(2022)의 ReAct는 LLM이 추론(Reasoning)과 행동(Acting)을 교차로 수행하며 외부 도구를 활용하는 프레임워크이다. Thought-Action-Observation 루프를 통해 환각(hallucination)을 줄이고 검증 가능한 결과를 생성한다.

구성 요소	역할	예시
Thought	현재 상태 분석과 다음 행동 계획	"사용자가 2024년 매출을 물어봤으니 DB를 조회해야겠다"
Action	외부 도구 호출	search("2024 revenue report"), calculate("150 * 1.1")
Observation	도구 실행 결과 관찰	"2024년 매출은 150억원으로 확인됨"

import json
from typing import Callable

class ReActAgent:
    """ReAct 패턴 기반 에이전트"""

    def __init__(self, model="gpt-4o"):
        self.client = OpenAI()
        self.model = model
        self.tools = {}
        self.max_iterations = 10

    def register_tool(self, name: str, func: Callable, description: str):
        """외부 도구 등록"""
        self.tools[name] = {
            "function": func,
            "description": description,
        }

    def _build_system_prompt(self) -> str:
        """시스템 프롬프트 구성"""
        tool_descriptions = "\n".join([
            f"- {name}: {info['description']}"
            for name, info in self.tools.items()
        ])

        return (
            "You are a helpful assistant that solves problems step by step.\n"
            "You have access to the following tools:\n"
            f"{tool_descriptions}\n\n"
            "For each step, respond in the following format:\n"
            "Thought: [your reasoning about what to do next]\n"
            "Action: [tool_name(argument)]\n\n"
            "After receiving an observation, continue with another Thought.\n"
            "When you have the final answer, respond with:\n"
            "Thought: [final reasoning]\n"
            "Final Answer: [your answer]\n\n"
            "IMPORTANT: Use exactly one Action per step. "
            "Wait for the Observation before proceeding."
        )

    def run(self, query: str) -> dict:
        """ReAct 루프 실행"""
        messages = [
            {"role": "system", "content": self._build_system_prompt()},
            {"role": "user", "content": query},
        ]

        steps = []

        for iteration in range(self.max_iterations):
            response = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0,
                max_tokens=500,
            )
            assistant_msg = response.choices[0].message.content

            # Final Answer 체크
            if "Final Answer:" in assistant_msg:
                final_answer = assistant_msg.split("Final Answer:")[-1].strip()
                steps.append({
                    "type": "final",
                    "content": assistant_msg,
                })
                return {
                    "answer": final_answer,
                    "steps": steps,
                    "iterations": iteration + 1,
                }

            # Action 파싱 및 실행
            import re
            action_match = re.search(r"Action:\s*(\w+)\((.+?)\)", assistant_msg)
            if action_match:
                tool_name = action_match.group(1)
                tool_arg = action_match.group(2).strip("'\"")

                steps.append({
                    "type": "thought_action",
                    "content": assistant_msg,
                    "tool": tool_name,
                    "argument": tool_arg,
                })

                # 도구 실행
                if tool_name in self.tools:
                    try:
                        observation = self.tools[tool_name]["function"](tool_arg)
                    except Exception as e:
                        observation = f"Error: {str(e)}"
                else:
                    observation = f"Error: Tool '{tool_name}' not found"

                steps.append({
                    "type": "observation",
                    "content": str(observation),
                })

                # 메시지 이력에 추가
                messages.append({"role": "assistant", "content": assistant_msg})
                messages.append({
                    "role": "user",
                    "content": f"Observation: {observation}"
                })
            else:
                # Action이 없으면 응답을 이력에 추가하고 계속
                messages.append({"role": "assistant", "content": assistant_msg})
                messages.append({
                    "role": "user",
                    "content": "Please continue with an Action or provide the Final Answer."
                })

        return {
            "answer": "Max iterations reached",
            "steps": steps,
            "iterations": self.max_iterations,
        }

# 사용 예시
def create_research_agent():
    """리서치 에이전트 생성"""
    agent = ReActAgent()

    # 도구 등록
    def search(query):
        # 실제로는 검색 API 호출
        return f"Search results for '{query}': [simulated results]"

    def calculate(expression):
        return str(eval(expression))

    def get_current_date():
        from datetime import datetime
        return datetime.now().strftime("%Y-%m-%d")

    agent.register_tool("search", search, "Search the web for information")
    agent.register_tool("calculate", calculate, "Evaluate a math expression")
    agent.register_tool("get_date", lambda _: get_current_date(), "Get current date")

    return agent

구조화된 출력 프롬프팅

프로덕션 환경에서는 LLM의 출력을 프로그래밍적으로 처리할 수 있는 구조화된 형식(JSON, XML 등)으로 받아야 한다.

from pydantic import BaseModel, Field
from typing import Literal

# Pydantic 모델을 활용한 구조화된 출력
class SentimentResult(BaseModel):
    """감성 분석 결과 스키마"""
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float = Field(ge=0.0, le=1.0)
    key_phrases: list[str]
    reasoning: str

def structured_sentiment_analysis(text: str) -> SentimentResult:
    """구조화된 출력으로 감성 분석 수행"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Analyze the sentiment of the given text. "
                    "Respond in JSON format with the following fields:\n"
                    "- sentiment: 'positive', 'negative', or 'neutral'\n"
                    "- confidence: float between 0.0 and 1.0\n"
                    "- key_phrases: list of key phrases that influenced the sentiment\n"
                    "- reasoning: brief explanation of the analysis"
                )
            },
            {"role": "user", "content": text}
        ],
        response_format={"type": "json_object"},
        temperature=0,
    )

    result = json.loads(response.choices[0].message.content)
    return SentimentResult(**result)

# 함수 호출(Function Calling) 기반 구조화
def function_calling_extraction(text: str) -> dict:
    """Function Calling을 활용한 정보 추출"""
    tools = [
        {
            "type": "function",
            "function": {
                "name": "extract_meeting_info",
                "description": "Extract meeting information from text",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "date": {
                            "type": "string",
                            "description": "Meeting date in YYYY-MM-DD format"
                        },
                        "time": {
                            "type": "string",
                            "description": "Meeting time in HH:MM format"
                        },
                        "participants": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": "List of participants"
                        },
                        "agenda": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": "Meeting agenda items"
                        },
                        "location": {
                            "type": "string",
                            "description": "Meeting location or meeting link"
                        }
                    },
                    "required": ["date", "time", "participants"]
                }
            }
        }
    ]

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract meeting info: {text}"}],
        tools=tools,
        tool_choice={"type": "function", "function": {"name": "extract_meeting_info"}},
    )

    tool_call = response.choices[0].message.tool_calls[0]
    return json.loads(tool_call.function.arguments)

프롬프트 체이닝

복잡한 작업을 여러 단계의 프롬프트로 분해하여 순차적으로 실행하는 기법이다. 각 단계의 출력이 다음 단계의 입력이 된다.

class PromptChain:
    """프롬프트 체이닝 프레임워크"""

    def __init__(self, model="gpt-4o"):
        self.client = OpenAI()
        self.model = model
        self.steps = []
        self.results = {}

    def add_step(self, name: str, prompt_template: str, depends_on: list = None):
        """체인에 단계 추가"""
        self.steps.append({
            "name": name,
            "prompt_template": prompt_template,
            "depends_on": depends_on or [],
        })

    def run(self, initial_input: str) -> dict:
        """체인 전체 실행"""
        self.results["input"] = initial_input

        for step in self.steps:
            # 의존 단계 결과로 프롬프트 구성
            prompt = step["prompt_template"]
            prompt = prompt.replace("INPUT", self.results.get("input", ""))
            for dep in step["depends_on"]:
                prompt = prompt.replace(
                    f"RESULT_{dep.upper()}",
                    self.results.get(dep, "")
                )

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                temperature=0,
            )

            self.results[step["name"]] = response.choices[0].message.content

        return self.results

# 사용 예시: 기술 문서 요약 + 번역 + 키워드 추출
def create_document_pipeline():
    """문서 처리 파이프라인"""
    chain = PromptChain()

    chain.add_step(
        name="summary",
        prompt_template=(
            "Summarize the following technical document in 3-5 bullet points:\n\n"
            "INPUT"
        )
    )

    chain.add_step(
        name="translation",
        prompt_template=(
            "Translate the following summary to Korean:\n\n"
            "RESULT_SUMMARY"
        ),
        depends_on=["summary"]
    )

    chain.add_step(
        name="keywords",
        prompt_template=(
            "Extract 5-10 technical keywords from the following summary. "
            "Format as a comma-separated list:\n\n"
            "RESULT_SUMMARY"
        ),
        depends_on=["summary"]
    )

    return chain

프롬프팅 기법 성능 비교

벤치마크 결과

기법	GSM8K (수학)	HotpotQA (QA)	Game of 24	토큰 비용
Zero-shot	17.9%	28.7%	-	1x
Few-shot	33.0%	35.2%	-	1.5x
Zero-shot CoT	40.7%	33.8%	-	1.5x
Few-shot CoT	58.1%	42.1%	4%	2x
Self-Consistency (k=40)	76.0%	47.3%	-	40x
Tree-of-Thought	-	-	74%	10-50x
ReAct	-	40.2%	-	3-5x

기법 선택 가이드

# 프롬프팅 기법 선택 의사결정 트리
decision_tree:
  simple_classification:
    recommended: 'Zero-shot 또는 Few-shot'
    reason: '단순 분류는 고급 기법 불필요'

  math_reasoning:
    recommended: 'CoT + Self-Consistency'
    reason: '수학 추론에서 가장 안정적인 성능'

  multi_step_search:
    recommended: 'ReAct'
    reason: '외부 정보 필요 시 도구 활용 가능'

  creative_problem_solving:
    recommended: 'Tree-of-Thought'
    reason: '탐색 공간이 넓은 창의적 문제에 적합'

  production_api:
    recommended: 'Few-shot + Structured Output'
    reason: '일관성과 파싱 가능성이 중요'

일반적인 안티패턴

안티패턴 1: 과도한 지시문

# BAD: 너무 많은 지시문은 모델을 혼란시킴
bad_prompt = """
You are an expert data scientist with 20 years of experience.
You must always be accurate and never hallucinate.
You should think carefully before answering.
Make sure your answer is complete and comprehensive.
Consider all edge cases and potential issues.
Be concise but thorough.
Use technical language but also be accessible.
Format your response nicely.
Include examples when appropriate.
Double-check your work before responding.

Question: What is the capital of France?
"""

# GOOD: 간결하고 구체적인 지시
good_prompt = """
Answer the following geography question with just the city name.
Question: What is the capital of France?
"""

안티패턴 2: 모호한 출력 형식

# BAD: 출력 형식이 불명확
bad_format = "Analyze this data and give me insights."

# GOOD: 명확한 출력 형식 지정
good_format = """
Analyze the following sales data and provide:
1. Top 3 insights (one sentence each)
2. Trend direction: "increasing", "decreasing", or "stable"
3. Recommended actions (bulleted list, max 3 items)

Respond in JSON format with keys: insights, trend, actions.
"""

안티패턴 3: 컨텍스트 윈도우 낭비

# BAD: 불필요한 반복 컨텍스트
def bad_batch_processing(items):
    """각 요청마다 동일한 긴 시스템 프롬프트 반복"""
    results = []
    for item in items:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": VERY_LONG_SYSTEM_PROMPT},
                {"role": "user", "content": item},
            ]
        )
        results.append(response.choices[0].message.content)
    return results

# GOOD: 배치 처리로 효율화
def good_batch_processing(items):
    """여러 항목을 한 번에 처리"""
    combined = "\n---\n".join([f"Item {i+1}: {item}" for i, item in enumerate(items)])
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Process each item below and return results "
                    "in JSON array format."
                )
            },
            {"role": "user", "content": combined},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

프로덕션 최적화

프롬프트 버전 관리

import hashlib
from datetime import datetime

class PromptRegistry:
    """프롬프트 버전 관리 시스템"""

    def __init__(self):
        self.prompts = {}
        self.history = []

    def register(self, name: str, template: str, version: str = None) -> str:
        """프롬프트 등록 및 버전 관리"""
        content_hash = hashlib.md5(template.encode()).hexdigest()[:8]
        version = version or f"v{len(self.history) + 1}_{content_hash}"

        entry = {
            "name": name,
            "version": version,
            "template": template,
            "hash": content_hash,
            "created_at": datetime.now().isoformat(),
        }

        self.prompts[name] = entry
        self.history.append(entry)
        return version

    def get(self, name: str) -> str:
        """현재 활성 프롬프트 반환"""
        if name not in self.prompts:
            raise KeyError(f"Prompt '{name}' not registered")
        return self.prompts[name]["template"]

    def get_version(self, name: str) -> str:
        """현재 프롬프트 버전 반환"""
        return self.prompts[name]["version"]

비용 최적화 전략

class CostOptimizer:
    """LLM API 비용 최적화"""

    # 모델별 가격 (1M 토큰당, 2026년 3월 기준 근사값)
    PRICING = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
        "claude-3-5-haiku": {"input": 0.80, "output": 4.00},
    }

    @staticmethod
    def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
        """비용 추정"""
        pricing = CostOptimizer.PRICING.get(model, {})
        input_cost = (input_tokens / 1_000_000) * pricing.get("input", 0)
        output_cost = (output_tokens / 1_000_000) * pricing.get("output", 0)
        return input_cost + output_cost

    @staticmethod
    def select_model(task_complexity: str) -> str:
        """작업 복잡도에 따른 모델 선택"""
        model_map = {
            "simple": "gpt-4o-mini",       # 분류, 추출 등 단순 작업
            "moderate": "gpt-4o-mini",      # CoT 가 필요한 보통 작업
            "complex": "gpt-4o",            # 복잡한 추론, 코드 생성
            "critical": "gpt-4o",           # 정확도가 최우선인 작업
        }
        return model_map.get(task_complexity, "gpt-4o-mini")

캐싱 전략

import hashlib
import json
from functools import lru_cache

class PromptCache:
    """프롬프트 응답 캐싱"""

    def __init__(self, cache_backend="memory"):
        self.cache = {}
        self.hits = 0
        self.misses = 0

    def _make_key(self, model: str, messages: list, temperature: float) -> str:
        """캐시 키 생성"""
        content = json.dumps({
            "model": model,
            "messages": messages,
            "temperature": temperature,
        }, sort_keys=True)
        return hashlib.sha256(content.encode()).hexdigest()

    def get(self, model: str, messages: list, temperature: float):
        """캐시에서 응답 조회"""
        if temperature > 0:
            # 비결정적 응답은 캐싱하지 않음
            return None

        key = self._make_key(model, messages, temperature)
        result = self.cache.get(key)
        if result:
            self.hits += 1
        else:
            self.misses += 1
        return result

    def set(self, model: str, messages: list, temperature: float, response: str):
        """캐시에 응답 저장"""
        if temperature > 0:
            return
        key = self._make_key(model, messages, temperature)
        self.cache[key] = response

    def stats(self) -> dict:
        """캐시 통계"""
        total = self.hits + self.misses
        return {
            "hits": self.hits,
            "misses": self.misses,
            "hit_rate": self.hits / total if total > 0 else 0,
            "cache_size": len(self.cache),
        }

운영 시 주의사항

프롬프트 인젝션 방어

프로덕션 환경에서 가장 중요한 보안 이슈는 프롬프트 인젝션이다. 사용자 입력이 시스템 프롬프트를 우회하여 의도하지 않은 동작을 유도할 수 있다.

def sanitize_user_input(user_input: str) -> str:
    """사용자 입력 정화"""
    # 1. 시스템 프롬프트 우회 시도 탐지
    injection_patterns = [
        "ignore previous instructions",
        "ignore all instructions",
        "disregard the above",
        "forget your instructions",
        "you are now",
        "new instruction:",
        "system prompt:",
    ]

    lower_input = user_input.lower()
    for pattern in injection_patterns:
        if pattern in lower_input:
            return "[BLOCKED: Potential prompt injection detected]"

    # 2. 입력 길이 제한
    max_length = 4000
    if len(user_input) > max_length:
        user_input = user_input[:max_length] + "... [truncated]"

    return user_input

장애 사례와 복구

# 일반적인 장애 시나리오
failure_scenarios:
  rate_limiting:
    symptom: '429 Too Many Requests'
    cause: 'API 호출 한도 초과'
    recovery:
      - '지수 백오프(exponential backoff) 적용'
      - '요청 큐 구현으로 트래픽 평활화'
      - '복수 API 키 로테이션'

  hallucination:
    symptom: '모델이 존재하지 않는 정보 생성'
    cause: '충분하지 않은 컨텍스트 또는 과도한 temperature'
    recovery:
      - 'temperature를 0으로 낮춤'
      - 'RAG 파이프라인으로 근거 자료 제공'
      - '출력 검증 레이어 추가'

  format_failure:
    symptom: 'JSON 파싱 실패'
    cause: '모델이 요청된 형식을 따르지 않음'
    recovery:
      - 'response_format 파라미터 사용'
      - 'Few-shot 예시로 형식 강제'
      - '실패 시 재시도 + 더 명확한 지시 추가'

  context_overflow:
    symptom: '컨텍스트 윈도우 초과 에러'
    cause: '입력 토큰이 모델 한도 초과'
    recovery:
      - '입력 텍스트 요약 또는 청킹'
      - '불필요한 Few-shot 예시 제거'
      - '더 긴 컨텍스트 모델로 전환'

평가 파이프라인

class PromptEvaluator:
    """프롬프트 A/B 테스트 평가기"""

    def __init__(self):
        self.results = []

    def evaluate(self, test_cases: list, prompt_a: str, prompt_b: str) -> dict:
        """두 프롬프트 비교 평가"""
        scores_a = []
        scores_b = []

        for case in test_cases:
            # 프롬프트 A 실행
            result_a = self._run_prompt(prompt_a, case["input"])
            score_a = self._score(result_a, case["expected"])
            scores_a.append(score_a)

            # 프롬프트 B 실행
            result_b = self._run_prompt(prompt_b, case["input"])
            score_b = self._score(result_b, case["expected"])
            scores_b.append(score_b)

        import numpy as np
        return {
            "prompt_a_avg": np.mean(scores_a),
            "prompt_b_avg": np.mean(scores_b),
            "prompt_a_std": np.std(scores_a),
            "prompt_b_std": np.std(scores_b),
            "winner": "A" if np.mean(scores_a) > np.mean(scores_b) else "B",
            "improvement": abs(np.mean(scores_a) - np.mean(scores_b)),
            "num_cases": len(test_cases),
        }

    def _run_prompt(self, prompt: str, input_text: str) -> str:
        """프롬프트 실행"""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": input_text},
            ],
            temperature=0,
        )
        return response.choices[0].message.content

    def _score(self, result: str, expected: str) -> float:
        """결과 평가 (0-1)"""
        # 간단한 문자열 유사도 기반 점수
        result_lower = result.lower().strip()
        expected_lower = expected.lower().strip()

        if result_lower == expected_lower:
            return 1.0
        elif expected_lower in result_lower:
            return 0.8
        else:
            return 0.0

마치며

프롬프트 엔지니어링은 단순한 텍스트 작성을 넘어 LLM의 추론 메커니즘을 이해하고 활용하는 엔지니어링 분야로 발전했다. Chain-of-Thought가 "추론 단계를 보여달라"는 간단한 아이디어에서 시작하여, Self-Consistency의 앙상블 전략, Tree-of-Thought의 체계적 탐색, ReAct의 도구 활용 패턴으로 확장되었다.

프로덕션 환경에서는 기법의 성능뿐 아니라 비용, 지연 시간, 일관성, 보안(프롬프트 인젝션 방어)을 종합적으로 고려해야 한다. 가장 중요한 것은 작업 특성에 맞는 기법을 선택하고, 체계적인 평가 파이프라인으로 지속적으로 개선하는 것이다.

앞으로 LLM의 기본 추론 능력이 향상됨에 따라 개별 프롬프팅 기법의 상대적 이점은 변할 수 있지만, "모델이 어떻게 추론하는지 이해하고 이를 가이드하는" 프롬프트 엔지니어링의 근본 원리는 변하지 않을 것이다.

참고자료

Advanced LLM Prompt Engineering: Chain-of-Thought, Tree-of-Thought, ReAct, and Few-Shot Pattern Practical Guide

Introduction
Prompting Technique Taxonomy
Zero-shot and Few-shot Prompting
- Zero-shot Prompting
- Few-shot Prompting
Chain-of-Thought (CoT) Prompting
- Core Principle
- Zero-shot CoT
Self-Consistency Decoding
Tree-of-Thought (ToT) Framework
- Core Idea
ReAct: Synergizing Reasoning and Acting
- Core Principle
Structured Output Prompting
Prompt Chaining
Prompting Technique Performance Comparison
- Benchmark Results
- Technique Selection Guide
Common Anti-patterns
Production Optimization
Operational Considerations
Conclusion
References

Introduction

Prompt engineering is a core technology for maximizing the latent capabilities of LLMs. The Chain-of-Thought paper published by Wei et al. in 2022 proved that "including reasoning processes in prompts dramatically improves the model's reasoning ability," establishing prompt engineering as an independent research field.

Subsequently, advanced techniques such as Self-Consistency, Tree-of-Thought, and ReAct emerged in succession, expanding the scope of LLM applications far beyond simple question-answer patterns to complex reasoning, planning, and external tool utilization. In particular, the ReAct pattern has become the core architecture of most AI agent frameworks (LangChain, AutoGen, etc.).

This article systematically covers the theoretical background, key paper findings, Python implementation code, performance comparisons, anti-patterns, and production optimization strategies for each prompting technique.

Prompting Technique Taxonomy

Prompting techniques can be classified as follows:

Category	Technique	Core Idea	Paper
Basic	Zero-shot	Perform with instructions only, no examples	-
Basic	Few-shot	Provide a few examples	Brown et al. 2020
Reasoning Enhancement	Chain-of-Thought	Generate intermediate reasoning steps	Wei et al. 2022
Reasoning Enhancement	Zero-shot CoT	Add a single phrase: "Let's think step by step"	Kojima et al. 2022
Ensemble	Self-Consistency	Multi-path sampling + majority voting	Wang et al. 2022
Search	Tree-of-Thought	Tree-structured reasoning path exploration	Yao et al. 2023
Agent	ReAct	Reasoning + Acting + Observation loop	Yao et al. 2022
Structured	Structured Output	Enforce JSON/XML format output	-
Composition	Prompt Chaining	Task decomposition + sequential execution	-

Zero-shot and Few-shot Prompting

Zero-shot Prompting

The most basic approach where the model performs a task using only instructions without examples. With recent performance improvements in large models (GPT-4, Claude 3.5, etc.), many tasks can achieve sufficient performance with Zero-shot alone.

from openai import OpenAI

client = OpenAI()

def zero_shot_classification(text: str) -> str:
    """Zero-shot text classification"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a text classifier. "
                    "Classify the given text into one of the following categories: "
                    "Technology, Business, Science, Sports, Entertainment. "
                    "Respond with only the category name."
                )
            },
            {"role": "user", "content": text}
        ],
        temperature=0,
        max_tokens=20,
    )
    return response.choices[0].message.content.strip()

Few-shot Prompting

Few-shot prompting includes a small number of input-output examples in the prompt to help the model learn patterns. It was systematically presented in the GPT-3 paper by Brown et al. (2020) and is particularly effective for tasks requiring consistent output formats.

def few_shot_entity_extraction(text: str) -> str:
    """Few-shot named entity extraction"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "Extract named entities from the given text in the specified format."
            },
            {
                "role": "user",
                "content": "Samsung Electronics announced the Galaxy S25 series at CES 2025 in Las Vegas."
            },
            {
                "role": "assistant",
                "content": (
                    "- Organization: Samsung Electronics\n"
                    "- Product: Galaxy S25\n"
                    "- Event: CES 2025\n"
                    "- Location: Las Vegas"
                )
            },
            {
                "role": "user",
                "content": "Elon Musk revealed that Tesla will open a new Gigafactory in Austin, Texas in March 2026."
            },
            {
                "role": "assistant",
                "content": (
                    "- Person: Elon Musk\n"
                    "- Organization: Tesla\n"
                    "- Facility: Gigafactory\n"
                    "- Location: Austin, Texas\n"
                    "- Date: March 2026"
                )
            },
            {"role": "user", "content": text}
        ],
        temperature=0,
    )
    return response.choices[0].message.content

# Few-shot example selection strategy
class FewShotSelector:
    """Dynamic few-shot example selector"""
    def __init__(self, examples, embedding_model="text-embedding-3-small"):
        self.examples = examples
        self.client = OpenAI()
        self.embedding_model = embedding_model
        self._precompute_embeddings()

    def _precompute_embeddings(self):
        """Precompute embeddings for all examples"""
        texts = [ex["input"] for ex in self.examples]
        response = self.client.embeddings.create(
            model=self.embedding_model,
            input=texts
        )
        self.embeddings = [r.embedding for r in response.data]

    def select(self, query: str, k: int = 3) -> list:
        """Select k most similar examples to the query"""
        query_emb = self.client.embeddings.create(
            model=self.embedding_model,
            input=[query]
        ).data[0].embedding

        # Compute cosine similarity
        import numpy as np
        similarities = []
        for emb in self.embeddings:
            sim = np.dot(query_emb, emb) / (
                np.linalg.norm(query_emb) * np.linalg.norm(emb)
            )
            similarities.append(sim)

        # Select top k
        top_indices = np.argsort(similarities)[-k:][::-1]
        return [self.examples[i] for i in top_indices]

Chain-of-Thought (CoT) Prompting

Core Principle

The Chain-of-Thought paper by Wei et al. (2022) demonstrated that including intermediate reasoning steps in prompts dramatically improves the complex reasoning ability of LLMs. Key findings include:

540B parameter PaLM model improved GSM8K (math problems) accuracy from 17.9% to 58.1%
CoT is an emergent ability that manifests only when the model is sufficiently large (100B+)
Just 8 CoT examples surpassed GPT-3 (fine-tuned) + verifier performance

def chain_of_thought_math(problem: str) -> dict:
    """Chain-of-Thought math problem solving"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a math tutor. Solve problems step by step.\n"
                    "Show your reasoning clearly before giving the final answer.\n"
                    "Format: Step 1, Step 2, ..., Final Answer: [number]"
                )
            },
            {
                "role": "user",
                "content": (
                    "A store has 45 apples. They sell 1/3 of them in the morning "
                    "and 2/5 of the remaining in the afternoon. "
                    "How many apples are left?"
                )
            },
            {
                "role": "assistant",
                "content": (
                    "Step 1: Calculate apples sold in the morning.\n"
                    "Morning sales = 45 * (1/3) = 15 apples\n\n"
                    "Step 2: Calculate remaining apples after morning.\n"
                    "Remaining = 45 - 15 = 30 apples\n\n"
                    "Step 3: Calculate apples sold in the afternoon.\n"
                    "Afternoon sales = 30 * (2/5) = 12 apples\n\n"
                    "Step 4: Calculate final remaining apples.\n"
                    "Final remaining = 30 - 12 = 18 apples\n\n"
                    "Final Answer: 18"
                )
            },
            {"role": "user", "content": problem}
        ],
        temperature=0,
    )

    answer_text = response.choices[0].message.content
    # Extract final answer
    import re
    match = re.search(r"Final Answer:\s*(\d+)", answer_text)
    final_answer = int(match.group(1)) if match else None

    return {
        "reasoning": answer_text,
        "answer": final_answer,
        "tokens_used": response.usage.total_tokens,
    }

Zero-shot CoT

Kojima et al. (2022) discovered that simply adding the phrase "Let's think step by step" achieves CoT effects without requiring separate examples. This is extremely practical as it eliminates the need for crafting examples.

def zero_shot_cot(problem: str) -> str:
    """Zero-shot Chain-of-Thought"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": f"{problem}\n\nLet's think step by step."
            }
        ],
        temperature=0,
    )
    return response.choices[0].message.content

Self-Consistency Decoding

Self-Consistency by Wang et al. (2022) replaces CoT's single greedy decoding with sampling multiple reasoning paths and determining the final answer through majority voting. It achieved +17.9% accuracy improvement over CoT on GSM8K.

import collections
import re

def self_consistency(problem: str, num_samples: int = 5) -> dict:
    """Self-Consistency decoding"""
    answers = []
    reasoning_paths = []

    for i in range(num_samples):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Solve the math problem step by step. "
                        "End with 'Final Answer: [number]'"
                    )
                },
                {"role": "user", "content": problem}
            ],
            temperature=0.7,  # Higher temperature for diverse reasoning paths
            max_tokens=500,
        )

        text = response.choices[0].message.content
        reasoning_paths.append(text)

        # Extract answer
        match = re.search(r"Final Answer:\s*(\d+)", text)
        if match:
            answers.append(int(match.group(1)))

    # Majority voting
    if answers:
        counter = collections.Counter(answers)
        majority_answer = counter.most_common(1)[0][0]
        confidence = counter.most_common(1)[0][1] / len(answers)
    else:
        majority_answer = None
        confidence = 0.0

    return {
        "answer": majority_answer,
        "confidence": confidence,
        "all_answers": answers,
        "num_samples": num_samples,
        "answer_distribution": dict(counter) if answers else {},
    }

Tree-of-Thought (ToT) Framework

Core Idea

Tree-of-Thought (ToT) by Yao et al. (2023) extends CoT into a tree structure that simultaneously explores multiple reasoning paths. Key findings include:

Game of 24 task: GPT-4 + CoT achieved 4% success rate -> ToT achieved 74%
Systematic exploration of reasoning paths using BFS/DFS strategies
The LLM itself evaluates each path, expanding only promising ones

from dataclasses import dataclass
from typing import Optional

@dataclass
class ThoughtNode:
    """ToT thought node"""
    content: str
    score: float = 0.0
    children: list = None
    parent: Optional['ThoughtNode'] = None
    depth: int = 0

    def __post_init__(self):
        if self.children is None:
            self.children = []

class TreeOfThought:
    """Tree-of-Thought Framework"""
    def __init__(self, model="gpt-4o", max_depth=3, branching_factor=3):
        self.client = OpenAI()
        self.model = model
        self.max_depth = max_depth
        self.branching_factor = branching_factor

    def generate_thoughts(self, problem: str, current_thought: str) -> list:
        """Generate possible next thoughts from current state"""
        prompt = (
            f"Problem: {problem}\n\n"
            f"Current reasoning so far:\n{current_thought}\n\n"
            f"Generate {self.branching_factor} different possible next steps "
            f"for solving this problem. "
            f"Format each as 'Step N: [reasoning]' separated by '---'"
        )
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.8,
        )
        text = response.choices[0].message.content
        thoughts = [t.strip() for t in text.split("---") if t.strip()]
        return thoughts[:self.branching_factor]

    def evaluate_thought(self, problem: str, thought_path: str) -> float:
        """Evaluate the promise of a thought path on a 0-1 scale"""
        prompt = (
            f"Problem: {problem}\n\n"
            f"Reasoning path:\n{thought_path}\n\n"
            f"Evaluate this reasoning path on a scale of 0.0 to 1.0:\n"
            f"- 1.0: Correct and complete solution\n"
            f"- 0.7-0.9: On the right track, promising\n"
            f"- 0.4-0.6: Partially correct but uncertain\n"
            f"- 0.0-0.3: Wrong approach or contains errors\n\n"
            f"Respond with only the score (e.g., 0.8)"
        )
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0,
            max_tokens=10,
        )
        try:
            score = float(response.choices[0].message.content.strip())
            return min(max(score, 0.0), 1.0)
        except ValueError:
            return 0.5

    def solve_bfs(self, problem: str) -> dict:
        """BFS-based ToT search"""
        root = ThoughtNode(content="", depth=0)
        current_level = [root]
        best_solution = None
        best_score = 0.0

        for depth in range(self.max_depth):
            next_level = []
            for node in current_level:
                # Generate child thoughts
                thought_path = self._get_path(node)
                children_thoughts = self.generate_thoughts(problem, thought_path)

                for thought in children_thoughts:
                    full_path = f"{thought_path}\n{thought}" if thought_path else thought
                    score = self.evaluate_thought(problem, full_path)

                    child = ThoughtNode(
                        content=thought,
                        score=score,
                        parent=node,
                        depth=depth + 1
                    )
                    node.children.append(child)
                    next_level.append(child)

                    if score > best_score:
                        best_score = score
                        best_solution = full_path

            # Keep only top branching_factor nodes (beam search)
            next_level.sort(key=lambda n: n.score, reverse=True)
            current_level = next_level[:self.branching_factor]

        return {
            "solution": best_solution,
            "score": best_score,
            "depth_explored": self.max_depth,
        }

    def _get_path(self, node: ThoughtNode) -> str:
        """Return the full thought path up to the node"""
        path = []
        current = node
        while current and current.content:
            path.append(current.content)
            current = current.parent
        return "\n".join(reversed(path))

ReAct: Synergizing Reasoning and Acting

Core Principle

ReAct by Yao et al. (2022) is a framework where LLMs alternate between reasoning and acting to leverage external tools. Through the Thought-Action-Observation loop, it reduces hallucination and generates verifiable results.

Component	Role	Example
Thought	Analyze current state and plan next action	"The user asked for 2024 revenue, so I need to query the DB"
Action	Call external tool	search("2024 revenue report"), calculate("150 * 1.1")
Observation	Observe tool execution result	"2024 revenue confirmed at 15 billion"

import json
from typing import Callable

class ReActAgent:
    """ReAct pattern-based agent"""

    def __init__(self, model="gpt-4o"):
        self.client = OpenAI()
        self.model = model
        self.tools = {}
        self.max_iterations = 10

    def register_tool(self, name: str, func: Callable, description: str):
        """Register external tool"""
        self.tools[name] = {
            "function": func,
            "description": description,
        }

    def _build_system_prompt(self) -> str:
        """Build system prompt"""
        tool_descriptions = "\n".join([
            f"- {name}: {info['description']}"
            for name, info in self.tools.items()
        ])

        return (
            "You are a helpful assistant that solves problems step by step.\n"
            "You have access to the following tools:\n"
            f"{tool_descriptions}\n\n"
            "For each step, respond in the following format:\n"
            "Thought: [your reasoning about what to do next]\n"
            "Action: [tool_name(argument)]\n\n"
            "After receiving an observation, continue with another Thought.\n"
            "When you have the final answer, respond with:\n"
            "Thought: [final reasoning]\n"
            "Final Answer: [your answer]\n\n"
            "IMPORTANT: Use exactly one Action per step. "
            "Wait for the Observation before proceeding."
        )

    def run(self, query: str) -> dict:
        """Execute ReAct loop"""
        messages = [
            {"role": "system", "content": self._build_system_prompt()},
            {"role": "user", "content": query},
        ]

        steps = []

        for iteration in range(self.max_iterations):
            response = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0,
                max_tokens=500,
            )
            assistant_msg = response.choices[0].message.content

            # Check for Final Answer
            if "Final Answer:" in assistant_msg:
                final_answer = assistant_msg.split("Final Answer:")[-1].strip()
                steps.append({
                    "type": "final",
                    "content": assistant_msg,
                })
                return {
                    "answer": final_answer,
                    "steps": steps,
                    "iterations": iteration + 1,
                }

            # Parse and execute Action
            import re
            action_match = re.search(r"Action:\s*(\w+)\((.+?)\)", assistant_msg)
            if action_match:
                tool_name = action_match.group(1)
                tool_arg = action_match.group(2).strip("'\"")

                steps.append({
                    "type": "thought_action",
                    "content": assistant_msg,
                    "tool": tool_name,
                    "argument": tool_arg,
                })

                # Execute tool
                if tool_name in self.tools:
                    try:
                        observation = self.tools[tool_name]["function"](tool_arg)
                    except Exception as e:
                        observation = f"Error: {str(e)}"
                else:
                    observation = f"Error: Tool '{tool_name}' not found"

                steps.append({
                    "type": "observation",
                    "content": str(observation),
                })

                # Add to message history
                messages.append({"role": "assistant", "content": assistant_msg})
                messages.append({
                    "role": "user",
                    "content": f"Observation: {observation}"
                })
            else:
                # If no Action, add to history and continue
                messages.append({"role": "assistant", "content": assistant_msg})
                messages.append({
                    "role": "user",
                    "content": "Please continue with an Action or provide the Final Answer."
                })

        return {
            "answer": "Max iterations reached",
            "steps": steps,
            "iterations": self.max_iterations,
        }

# Usage example
def create_research_agent():
    """Create a research agent"""
    agent = ReActAgent()

    # Register tools
    def search(query):
        # In practice, this would call a search API
        return f"Search results for '{query}': [simulated results]"

    def calculate(expression):
        return str(eval(expression))

    def get_current_date():
        from datetime import datetime
        return datetime.now().strftime("%Y-%m-%d")

    agent.register_tool("search", search, "Search the web for information")
    agent.register_tool("calculate", calculate, "Evaluate a math expression")
    agent.register_tool("get_date", lambda _: get_current_date(), "Get current date")

    return agent

Structured Output Prompting

In production environments, LLM outputs must be received in structured formats (JSON, XML, etc.) that can be programmatically processed.

from pydantic import BaseModel, Field
from typing import Literal

# Structured output using Pydantic models
class SentimentResult(BaseModel):
    """Sentiment analysis result schema"""
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float = Field(ge=0.0, le=1.0)
    key_phrases: list[str]
    reasoning: str

def structured_sentiment_analysis(text: str) -> SentimentResult:
    """Perform sentiment analysis with structured output"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Analyze the sentiment of the given text. "
                    "Respond in JSON format with the following fields:\n"
                    "- sentiment: 'positive', 'negative', or 'neutral'\n"
                    "- confidence: float between 0.0 and 1.0\n"
                    "- key_phrases: list of key phrases that influenced the sentiment\n"
                    "- reasoning: brief explanation of the analysis"
                )
            },
            {"role": "user", "content": text}
        ],
        response_format={"type": "json_object"},
        temperature=0,
    )

    result = json.loads(response.choices[0].message.content)
    return SentimentResult(**result)

# Function Calling-based structuring
def function_calling_extraction(text: str) -> dict:
    """Information extraction using Function Calling"""
    tools = [
        {
            "type": "function",
            "function": {
                "name": "extract_meeting_info",
                "description": "Extract meeting information from text",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "date": {
                            "type": "string",
                            "description": "Meeting date in YYYY-MM-DD format"
                        },
                        "time": {
                            "type": "string",
                            "description": "Meeting time in HH:MM format"
                        },
                        "participants": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": "List of participants"
                        },
                        "agenda": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": "Meeting agenda items"
                        },
                        "location": {
                            "type": "string",
                            "description": "Meeting location or meeting link"
                        }
                    },
                    "required": ["date", "time", "participants"]
                }
            }
        }
    ]

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract meeting info: {text}"}],
        tools=tools,
        tool_choice={"type": "function", "function": {"name": "extract_meeting_info"}},
    )

    tool_call = response.choices[0].message.tool_calls[0]
    return json.loads(tool_call.function.arguments)

Prompt Chaining

A technique that decomposes complex tasks into multiple prompt stages and executes them sequentially. Each stage's output becomes the next stage's input.

class PromptChain:
    """Prompt Chaining Framework"""

    def __init__(self, model="gpt-4o"):
        self.client = OpenAI()
        self.model = model
        self.steps = []
        self.results = {}

    def add_step(self, name: str, prompt_template: str, depends_on: list = None):
        """Add a step to the chain"""
        self.steps.append({
            "name": name,
            "prompt_template": prompt_template,
            "depends_on": depends_on or [],
        })

    def run(self, initial_input: str) -> dict:
        """Execute the entire chain"""
        self.results["input"] = initial_input

        for step in self.steps:
            # Construct prompt with dependent step results
            prompt = step["prompt_template"]
            prompt = prompt.replace("INPUT", self.results.get("input", ""))
            for dep in step["depends_on"]:
                prompt = prompt.replace(
                    f"RESULT_{dep.upper()}",
                    self.results.get(dep, "")
                )

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                temperature=0,
            )

            self.results[step["name"]] = response.choices[0].message.content

        return self.results

# Usage example: Technical document summarize + translate + keyword extraction
def create_document_pipeline():
    """Document processing pipeline"""
    chain = PromptChain()

    chain.add_step(
        name="summary",
        prompt_template=(
            "Summarize the following technical document in 3-5 bullet points:\n\n"
            "INPUT"
        )
    )

    chain.add_step(
        name="translation",
        prompt_template=(
            "Translate the following summary to Korean:\n\n"
            "RESULT_SUMMARY"
        ),
        depends_on=["summary"]
    )

    chain.add_step(
        name="keywords",
        prompt_template=(
            "Extract 5-10 technical keywords from the following summary. "
            "Format as a comma-separated list:\n\n"
            "RESULT_SUMMARY"
        ),
        depends_on=["summary"]
    )

    return chain

Prompting Technique Performance Comparison

Benchmark Results

Technique	GSM8K (Math)	HotpotQA (QA)	Game of 24	Token Cost
Zero-shot	17.9%	28.7%	-	1x
Few-shot	33.0%	35.2%	-	1.5x
Zero-shot CoT	40.7%	33.8%	-	1.5x
Few-shot CoT	58.1%	42.1%	4%	2x
Self-Consistency (k=40)	76.0%	47.3%	-	40x
Tree-of-Thought	-	-	74%	10-50x
ReAct	-	40.2%	-	3-5x

Technique Selection Guide

# Prompting technique selection decision tree
decision_tree:
  simple_classification:
    recommended: 'Zero-shot or Few-shot'
    reason: 'Simple classification does not require advanced techniques'

  math_reasoning:
    recommended: 'CoT + Self-Consistency'
    reason: 'Most stable performance for mathematical reasoning'

  multi_step_search:
    recommended: 'ReAct'
    reason: 'Tool utilization possible when external information is needed'

  creative_problem_solving:
    recommended: 'Tree-of-Thought'
    reason: 'Suitable for creative problems with large search spaces'

  production_api:
    recommended: 'Few-shot + Structured Output'
    reason: 'Consistency and parsability are paramount'

Common Anti-patterns

Anti-pattern 1: Excessive Instructions

# BAD: Too many instructions confuse the model
bad_prompt = """
You are an expert data scientist with 20 years of experience.
You must always be accurate and never hallucinate.
You should think carefully before answering.
Make sure your answer is complete and comprehensive.
Consider all edge cases and potential issues.
Be concise but thorough.
Use technical language but also be accessible.
Format your response nicely.
Include examples when appropriate.
Double-check your work before responding.

Question: What is the capital of France?
"""

# GOOD: Concise and specific instructions
good_prompt = """
Answer the following geography question with just the city name.
Question: What is the capital of France?
"""

Anti-pattern 2: Ambiguous Output Format

# BAD: Output format is unclear
bad_format = "Analyze this data and give me insights."

# GOOD: Clear output format specification
good_format = """
Analyze the following sales data and provide:
1. Top 3 insights (one sentence each)
2. Trend direction: "increasing", "decreasing", or "stable"
3. Recommended actions (bulleted list, max 3 items)

Respond in JSON format with keys: insights, trend, actions.
"""

Anti-pattern 3: Context Window Waste

# BAD: Repeating the same long system prompt for each request
def bad_batch_processing(items):
    """Repeats identical long system prompt for every request"""
    results = []
    for item in items:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": VERY_LONG_SYSTEM_PROMPT},
                {"role": "user", "content": item},
            ]
        )
        results.append(response.choices[0].message.content)
    return results

# GOOD: Optimize with batch processing
def good_batch_processing(items):
    """Process multiple items at once"""
    combined = "\n---\n".join([f"Item {i+1}: {item}" for i, item in enumerate(items)])
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Process each item below and return results "
                    "in JSON array format."
                )
            },
            {"role": "user", "content": combined},
        ],
        response_format={"type": "json_object"},
    )
    return json.loads(response.choices[0].message.content)

Production Optimization

Prompt Version Management

import hashlib
from datetime import datetime

class PromptRegistry:
    """Prompt version management system"""

    def __init__(self):
        self.prompts = {}
        self.history = []

    def register(self, name: str, template: str, version: str = None) -> str:
        """Register and version manage prompts"""
        content_hash = hashlib.md5(template.encode()).hexdigest()[:8]
        version = version or f"v{len(self.history) + 1}_{content_hash}"

        entry = {
            "name": name,
            "version": version,
            "template": template,
            "hash": content_hash,
            "created_at": datetime.now().isoformat(),
        }

        self.prompts[name] = entry
        self.history.append(entry)
        return version

    def get(self, name: str) -> str:
        """Return the currently active prompt"""
        if name not in self.prompts:
            raise KeyError(f"Prompt '{name}' not registered")
        return self.prompts[name]["template"]

    def get_version(self, name: str) -> str:
        """Return current prompt version"""
        return self.prompts[name]["version"]

Cost Optimization Strategy

class CostOptimizer:
    """LLM API cost optimization"""

    # Per-model pricing (per 1M tokens, approximate as of March 2026)
    PRICING = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
        "claude-3-5-haiku": {"input": 0.80, "output": 4.00},
    }

    @staticmethod
    def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
        """Estimate cost"""
        pricing = CostOptimizer.PRICING.get(model, {})
        input_cost = (input_tokens / 1_000_000) * pricing.get("input", 0)
        output_cost = (output_tokens / 1_000_000) * pricing.get("output", 0)
        return input_cost + output_cost

    @staticmethod
    def select_model(task_complexity: str) -> str:
        """Select model based on task complexity"""
        model_map = {
            "simple": "gpt-4o-mini",       # Classification, extraction, etc.
            "moderate": "gpt-4o-mini",      # Tasks requiring CoT
            "complex": "gpt-4o",            # Complex reasoning, code generation
            "critical": "gpt-4o",           # Tasks where accuracy is top priority
        }
        return model_map.get(task_complexity, "gpt-4o-mini")

Caching Strategy

import hashlib
import json
from functools import lru_cache

class PromptCache:
    """Prompt response caching"""

    def __init__(self, cache_backend="memory"):
        self.cache = {}
        self.hits = 0
        self.misses = 0

    def _make_key(self, model: str, messages: list, temperature: float) -> str:
        """Generate cache key"""
        content = json.dumps({
            "model": model,
            "messages": messages,
            "temperature": temperature,
        }, sort_keys=True)
        return hashlib.sha256(content.encode()).hexdigest()

    def get(self, model: str, messages: list, temperature: float):
        """Look up response in cache"""
        if temperature > 0:
            # Do not cache non-deterministic responses
            return None

        key = self._make_key(model, messages, temperature)
        result = self.cache.get(key)
        if result:
            self.hits += 1
        else:
            self.misses += 1
        return result

    def set(self, model: str, messages: list, temperature: float, response: str):
        """Store response in cache"""
        if temperature > 0:
            return
        key = self._make_key(model, messages, temperature)
        self.cache[key] = response

    def stats(self) -> dict:
        """Cache statistics"""
        total = self.hits + self.misses
        return {
            "hits": self.hits,
            "misses": self.misses,
            "hit_rate": self.hits / total if total > 0 else 0,
            "cache_size": len(self.cache),
        }

Operational Considerations

Prompt Injection Defense

The most important security issue in production environments is prompt injection. User input can bypass system prompts to induce unintended behavior.

def sanitize_user_input(user_input: str) -> str:
    """Sanitize user input"""
    # 1. Detect system prompt bypass attempts
    injection_patterns = [
        "ignore previous instructions",
        "ignore all instructions",
        "disregard the above",
        "forget your instructions",
        "you are now",
        "new instruction:",
        "system prompt:",
    ]

    lower_input = user_input.lower()
    for pattern in injection_patterns:
        if pattern in lower_input:
            return "[BLOCKED: Potential prompt injection detected]"

    # 2. Limit input length
    max_length = 4000
    if len(user_input) > max_length:
        user_input = user_input[:max_length] + "... [truncated]"

    return user_input

Failure Cases and Recovery

# Common failure scenarios
failure_scenarios:
  rate_limiting:
    symptom: '429 Too Many Requests'
    cause: 'API call limit exceeded'
    recovery:
      - 'Apply exponential backoff'
      - 'Implement request queue for traffic smoothing'
      - 'Rotate multiple API keys'

  hallucination:
    symptom: 'Model generates non-existent information'
    cause: 'Insufficient context or excessive temperature'
    recovery:
      - 'Lower temperature to 0'
      - 'Provide grounding material via RAG pipeline'
      - 'Add output verification layer'

  format_failure:
    symptom: 'JSON parsing failure'
    cause: 'Model does not follow requested format'
    recovery:
      - 'Use response_format parameter'
      - 'Enforce format with Few-shot examples'
      - 'Retry on failure with clearer instructions'

  context_overflow:
    symptom: 'Context window exceeded error'
    cause: 'Input tokens exceed model limit'
    recovery:
      - 'Summarize or chunk input text'
      - 'Remove unnecessary Few-shot examples'
      - 'Switch to a model with longer context'

Evaluation Pipeline

class PromptEvaluator:
    """Prompt A/B test evaluator"""

    def __init__(self):
        self.results = []

    def evaluate(self, test_cases: list, prompt_a: str, prompt_b: str) -> dict:
        """Comparative evaluation of two prompts"""
        scores_a = []
        scores_b = []

        for case in test_cases:
            # Execute Prompt A
            result_a = self._run_prompt(prompt_a, case["input"])
            score_a = self._score(result_a, case["expected"])
            scores_a.append(score_a)

            # Execute Prompt B
            result_b = self._run_prompt(prompt_b, case["input"])
            score_b = self._score(result_b, case["expected"])
            scores_b.append(score_b)

        import numpy as np
        return {
            "prompt_a_avg": np.mean(scores_a),
            "prompt_b_avg": np.mean(scores_b),
            "prompt_a_std": np.std(scores_a),
            "prompt_b_std": np.std(scores_b),
            "winner": "A" if np.mean(scores_a) > np.mean(scores_b) else "B",
            "improvement": abs(np.mean(scores_a) - np.mean(scores_b)),
            "num_cases": len(test_cases),
        }

    def _run_prompt(self, prompt: str, input_text: str) -> str:
        """Execute prompt"""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": input_text},
            ],
            temperature=0,
        )
        return response.choices[0].message.content

    def _score(self, result: str, expected: str) -> float:
        """Score result (0-1)"""
        # Simple string similarity-based scoring
        result_lower = result.lower().strip()
        expected_lower = expected.lower().strip()

        if result_lower == expected_lower:
            return 1.0
        elif expected_lower in result_lower:
            return 0.8
        else:
            return 0.0

Conclusion

Prompt engineering has evolved from simple text crafting into an engineering discipline that understands and leverages the reasoning mechanisms of LLMs. Starting from Chain-of-Thought's simple idea of "show me the reasoning steps," it has expanded to Self-Consistency's ensemble strategy, Tree-of-Thought's systematic search, and ReAct's tool utilization pattern.

In production environments, not only technique performance but also cost, latency, consistency, and security (prompt injection defense) must be holistically considered. The most important thing is selecting the right technique for the task characteristics and continuously improving through systematic evaluation pipelines.

As LLMs' baseline reasoning capabilities improve in the future, the relative advantages of individual prompting techniques may change, but the fundamental principle of prompt engineering -- "understanding how the model reasons and guiding it" -- will remain unchanged.

LLM 프롬프트 엔지니어링 고급 기법: Chain-of-Thought·Tree-of-Thought·ReAct·Few-Shot 패턴 실전 가이드

들어가며

프롬프팅 기법 분류 체계

Zero-shot과 Few-shot 프롬프팅

Zero-shot 프롬프팅

Few-shot 프롬프팅

Chain-of-Thought (CoT) 프롬프팅

핵심 원리

Zero-shot CoT

Self-Consistency 디코딩

Tree-of-Thought (ToT) 프레임워크

핵심 아이디어

ReAct: 추론과 행동의 결합

핵심 원리

구조화된 출력 프롬프팅

프롬프트 체이닝

프롬프팅 기법 성능 비교

벤치마크 결과

기법 선택 가이드

일반적인 안티패턴

안티패턴 1: 과도한 지시문

안티패턴 2: 모호한 출력 형식

안티패턴 3: 컨텍스트 윈도우 낭비

프로덕션 최적화

프롬프트 버전 관리

비용 최적화 전략

캐싱 전략

운영 시 주의사항

프롬프트 인젝션 방어

장애 사례와 복구

평가 파이프라인

마치며

참고자료

Advanced LLM Prompt Engineering: Chain-of-Thought, Tree-of-Thought, ReAct, and Few-Shot Pattern Practical Guide

Introduction

Prompting Technique Taxonomy

Zero-shot and Few-shot Prompting

Zero-shot Prompting

Few-shot Prompting

Chain-of-Thought (CoT) Prompting

Core Principle

Zero-shot CoT

Self-Consistency Decoding

Tree-of-Thought (ToT) Framework

Core Idea

ReAct: Synergizing Reasoning and Acting

Core Principle

Structured Output Prompting

Prompt Chaining

Prompting Technique Performance Comparison

Benchmark Results

Technique Selection Guide

Common Anti-patterns

Anti-pattern 1: Excessive Instructions

Anti-pattern 2: Ambiguous Output Format

Anti-pattern 3: Context Window Waste

Production Optimization

Prompt Version Management

Cost Optimization Strategy

Caching Strategy

Operational Considerations

Prompt Injection Defense

Failure Cases and Recovery

Evaluation Pipeline

Conclusion

References