- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Why Tool Calling Is the Core of Agents
- OpenAI Function Calling: Full Implementation
- Anthropic Claude Tool Use
- Five Common Mistakes and How to Fix Them
- Parallel Tool Calls
- Tool Design Principles
- Production Middleware
- Wrapping Up
An LLM on its own transforms text. Add tools and it can actually do things. Tool calling is the mechanism that turns a passive text generator into an active agent.
This post covers how tool calling actually works under the hood, how to implement it correctly, and the specific mistakes that trip up every team building agent systems in production.
Why Tool Calling Is the Core of Agents
An agent's capabilities are exactly as wide as its available tools:
- Search tool → access to real-time information
- Calculator tool → guaranteed mathematical accuracy
- Code execution tool → write and run actual code
- API tool → integrate with external services
- Database tool → read and write persistent data
Without tools, an LLM can only answer from its training data. With tools, it can look up today's stock prices, send emails, and execute code.
OpenAI Function Calling: Full Implementation
OpenAI's function calling has become the de facto standard API format.
Define Your Tools
import openai
import json
from typing import Any
client = openai.OpenAI(api_key="your-api-key")
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": (
"Get current weather for a specific city. "
"Use this when the user asks about current weather conditions. "
"Do NOT use for weather forecasts or historical data."
),
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name in English, e.g. 'Seoul', 'Tokyo', 'New York'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Default: celsius"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": (
"Search the web for current information. "
"Use for recent news, events, or anything that may have changed "
"since the model's training cutoff."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
},
"num_results": {
"type": "integer",
"description": "Number of results to return (1-10)",
"default": 3
}
},
"required": ["query"]
}
}
}
]
Implement the Actual Functions
def get_weather(city: str, unit: str = "celsius") -> dict:
response = weather_api.get(city=city, unit=unit)
return {
"city": city,
"temperature": response.temp,
"unit": unit,
"condition": response.condition,
"humidity": response.humidity
}
def search_web(query: str, num_results: int = 3) -> list:
results = search_api.search(query, count=num_results)
return [
{"title": r.title, "snippet": r.snippet, "url": r.url}
for r in results
]
available_tools = {
"get_weather": get_weather,
"search_web": search_web
}
The Complete Tool Call Loop
def run_agent(user_message: str, max_iterations: int = 10) -> str:
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
tools=tools,
tool_choice="auto"
)
assistant_message = response.choices[0].message
messages.append(assistant_message)
# No tool calls → final answer
if not assistant_message.tool_calls:
return assistant_message.content
# Execute each tool call
for tool_call in assistant_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print(f"Calling: {function_name}({function_args})")
if function_name in available_tools:
try:
result = available_tools[function_name](**function_args)
tool_result = json.dumps(result, ensure_ascii=False)
except Exception as e:
tool_result = f"Error: {str(e)}. Try a different approach."
else:
tool_result = f"Unknown tool: {function_name}"
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": tool_result
})
return "Max iterations reached"
result = run_agent("What's the weather in Seoul? Also find me today's AI news.")
print(result)
Anthropic Claude Tool Use
Claude uses a different format but the same concept:
import anthropic
import json
client = anthropic.Anthropic(api_key="your-api-key")
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
]
def run_claude_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=4096,
tools=tools,
messages=messages
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return " ".join(
block.text for block in response.content
if hasattr(block, "text")
)
# Handle tool_use blocks
tool_results = []
for block in response.content:
if block.type == "tool_use":
try:
result = available_tools[block.name](**block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
except Exception as e:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"Error: {str(e)}",
"is_error": True
})
messages.append({"role": "user", "content": tool_results})
Five Common Mistakes and How to Fix Them
This section is the most valuable part of this post. These are real problems you will hit in production.
Mistake 1: Vague Tool Descriptions
The LLM uses your description to decide when and how to call your tool. Vague descriptions cause incorrect or unnecessary tool calls.
# Bad: ambiguous and unhelpful
{
"name": "query",
"description": "Query data"
}
# Good: when to use it, what it returns, what NOT to use it for
{
"name": "query_customer_orders",
"description": (
"Fetch order history for a specific customer. Requires customer_id. "
"Returns: list of orders with order ID, date, amount, and status. "
"For customer profile info (name, email), use get_customer_info instead."
)
}
Mistake 2: No Error Handling
Unhandled exceptions kill the entire agent loop.
# Bad: unhandled exception crashes the agent
result = database.query(sql)
# Good: errors become tool results — the LLM can adapt its approach
try:
result = database.query(sql)
return json.dumps(result)
except DatabaseError as e:
return f"Database error: {str(e)}. The query may have a syntax error."
except TimeoutError:
return "Query timed out. Try adding more specific filters or simplify the query."
except Exception as e:
return f"Unexpected error: {str(e)}. Please try a different approach."
Mistake 3: No Loop Detection
An agent can get stuck calling the same tool with the same arguments repeatedly.
MAX_ITERATIONS = 15
call_count = {}
for iteration in range(MAX_ITERATIONS):
# ...
for tool_call in tool_calls:
name = tool_call.function.name
args_str = tool_call.function.arguments
call_key = f"{name}:{args_str}"
call_count[call_key] = call_count.get(call_key, 0) + 1
if call_count[call_key] > 3:
return "Error: same tool call repeated too many times. Breaking loop."
Mistake 4: Too Many Tools Defined at Once
LLM accuracy degrades when you provide too many tools. Beyond about 10, consider dynamic tool selection.
def select_relevant_tools(user_query: str, all_tools: list, max_tools: int = 8) -> list:
"""Select only tools relevant to the user's query"""
if len(all_tools) <= max_tools:
return all_tools
query_words = set(user_query.lower().split())
scored_tools = []
for tool in all_tools:
desc_words = set(tool["function"]["description"].lower().split())
overlap = len(query_words & desc_words)
scored_tools.append((overlap, tool))
scored_tools.sort(key=lambda x: x[0], reverse=True)
return [t for _, t in scored_tools[:max_tools]]
Mistake 5: No Human Confirmation for Irreversible Actions
Delete, payment, email send — these need human approval before execution.
HIGH_RISK_TOOLS = {"delete_record", "send_email", "process_payment", "deploy_code"}
def execute_with_approval(tool_name: str, args: dict) -> str:
if tool_name in HIGH_RISK_TOOLS:
# In a real app: show UI modal, send Slack notification, etc.
print(f"HIGH RISK: {tool_name}({args})")
approval = input("Approve? (yes/no): ")
if approval.lower() != "yes":
return "Action cancelled by user."
return available_tools[tool_name](**args)
Parallel Tool Calls
Modern LLMs can request multiple tool calls in a single response. Use this — it makes agents dramatically faster.
import asyncio
from concurrent.futures import ThreadPoolExecutor
import time
async def execute_tool_calls_parallel(tool_calls: list) -> list:
"""Execute multiple tool calls concurrently"""
async def execute_single(tool_call):
name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
try:
loop = asyncio.get_event_loop()
with ThreadPoolExecutor() as pool:
result = await loop.run_in_executor(
pool,
lambda: available_tools[name](**args)
)
return {
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
}
except Exception as e:
return {
"role": "tool",
"tool_call_id": tool_call.id,
"content": f"Error: {str(e)}"
}
tasks = [execute_single(tc) for tc in tool_calls]
results = await asyncio.gather(*tasks)
return results
# Example: "What's the weather in Seoul, Tokyo, and New York?"
# LLM returns three get_weather calls simultaneously
# Sequential: 3 API calls × 1s each = 3 seconds
# Parallel: 3 API calls simultaneously = ~1 second
Tool Design Principles
The difference between well-designed and poorly-designed tools shows up directly in agent performance.
Principle 1: Single Responsibility
Each tool does one thing.
# Bad: one function doing everything
def manage_customer(action: str, customer_id: int, **kwargs): ...
# Good: separate functions
def get_customer(customer_id: int) -> dict: ...
def update_customer(customer_id: int, name: str = None, email: str = None) -> dict: ...
def delete_customer(customer_id: int) -> dict: ...
Principle 2: Separate Reads from Writes
Read tools have no side effects. Write tools change state. Keep them clearly separate.
# Read tools: safe, idempotent, can be called multiple times
def get_user_balance(user_id: int) -> float: ...
def search_products(query: str) -> list: ...
# Write tools: irreversible, handle with care
def transfer_money(from_id: int, to_id: int, amount: float) -> dict: ...
def delete_order(order_id: int) -> dict: ...
Principle 3: Return Structured Data
Return JSON, not prose. The LLM reasons over structured data far better than natural language sentences.
# Bad: LLM has to parse this
def get_user_info(user_id: int) -> str:
return f"John Doe, 30 years old, email: john@example.com, joined 2022"
# Good: structured data the LLM can reason about precisely
def get_user_info(user_id: int) -> dict:
return {
"id": user_id,
"name": "John Doe",
"age": 30,
"email": "john@example.com",
"joined_at": "2022-01-15"
}
Principle 4: Include Error Details in Results
def safe_tool_wrapper(func):
"""Wrap tool functions with consistent error handling"""
def wrapper(*args, **kwargs):
try:
result = func(*args, **kwargs)
return {"success": True, "data": result}
except ValueError as e:
return {"success": False, "error": "invalid_input", "message": str(e)}
except PermissionError as e:
return {"success": False, "error": "permission_denied", "message": str(e)}
except Exception as e:
return {"success": False, "error": "unexpected", "message": str(e)}
return wrapper
@safe_tool_wrapper
def create_order(product_id: int, quantity: int, user_id: int) -> dict:
...
Production Middleware
Putting it all together — a middleware class that wraps tool execution with the safeguards you need in production:
import time
import asyncio
class ToolCallMiddleware:
def __init__(self, tools: dict, max_iterations: int = 10):
self.tools = tools
self.max_iterations = max_iterations
self.call_log = []
async def execute(self, tool_name: str, args: dict, timeout: int = 30) -> dict:
start_time = time.time()
# 1. Tool existence check
if tool_name not in self.tools:
return {"error": f"Unknown tool: {tool_name}"}
# 2. Input validation
try:
validated_args = self._validate_args(tool_name, args)
except ValueError as e:
return {"error": f"Invalid arguments: {str(e)}"}
# 3. Execute with timeout
try:
result = await asyncio.wait_for(
asyncio.to_thread(self.tools[tool_name], **validated_args),
timeout=timeout
)
except asyncio.TimeoutError:
result = {"error": f"Timed out after {timeout}s"}
except Exception as e:
result = {"error": str(e)}
# 4. Log every call
elapsed = time.time() - start_time
self.call_log.append({
"tool": tool_name,
"args": args,
"elapsed_ms": round(elapsed * 1000),
"success": "error" not in result,
"timestamp": time.time()
})
return result
Wrapping Up
Tool calling is what makes LLMs genuinely useful as agents. The basic implementation isn't complicated — but building tool calling that holds up in production is a game of details.
Three things matter most: clear tool descriptions, graceful error handling, and loop prevention. Get these right and you'll avoid the majority of the problems that sink tool-enabled agents.
This series covered agent design patterns (Post 5), MCP (Post 6), multi-agent systems (Post 7), and tool calling (Post 8). Now go build something — implementation teaches faster than theory.