
Tool Selection as a Context Problem: Why Less Is More in MCP
Tool Selection as a Context Problem: Why Less Is More in MCP
You built an MCP server. You're excited. You register all your tools — 30, 40, maybe 50 of them.
Your thinking: "The more tools my agent has access to, the more capable it is!"
Then you notice:
- The agent is slower
- Tool selection is less accurate
- Costs are higher than expected
- Sometimes it picks the wrong tool
The problem isn't MCP. The problem is that you're treating tool registration like a feature list, not a context management problem.
Every tool you register adds to the context window. And every token in that window competes for the model's attention.
Let me show you how to fix this with selective tool loading.
The MCP Tool Metadata Problem
When you register a tool via MCP, you're not just giving the agent access to functionality. You're adding text to its context window.
Each tool requires:
- Name (5-20 tokens)
- Description (20-100 tokens)
- Parameter definitions (10-50 tokens per parameter)
- Return type documentation (10-30 tokens)
- Examples (50-200 tokens, if included)
Example tool definition:
from mcp import Tool
order_lookup_tool = Tool(
name="lookup_order",
description="Looks up order details by order number or customer email. Returns order status, items, shipping info, and payment details.",
parameters={
"order_id": {
"type": "string",
"description": "The order number (format: ORD-XXXXX)",
"required": False
},
"customer_email": {
"type": "string",
"description": "Customer's email address",
"required": False
}
},
returns="OrderDetails object with status, items, shipping, payment info"
)
Token count: ~150 tokens
Now multiply by 30 tools: 4,500 tokens
That's 4,500 tokens the agent processes on every single request, even if it only uses 2-3 tools.
The Hidden Costs
Cost #1: Token Usage
Scenario: Customer support agent with 30 tools
Average request:
- Tool metadata: 4,500 tokens
- System prompt: 500 tokens
- Conversation history: 2,000 tokens
- User query: 100 tokens
- Total input: 7,100 tokens
Usage: 10,000 requests/day
Monthly cost (Claude Sonnet 3.5):
- Input: 7,100 tokens × 10,000 × 30 days = 2.13B tokens
- At $0.003 per 1K tokens = $6,390/month
With selective tool loading (5 tools per request):
- Tool metadata: 750 tokens
- System prompt: 500 tokens
- Conversation history: 2,000 tokens
- User query: 100 tokens
- Total input: 3,350 tokens
Monthly cost:
- Input: 3,350 tokens × 10,000 × 30 days = 1.005B tokens
- At $0.003 per 1K tokens = $3,015/month
Savings: $3,375/month ($40,500/year)
Cost #2: Attention Dilution
The more tools in context, the harder it is for the model to pick the right one.
Think of it like this:
You're at a restaurant.
Scenario A: Menu has 10 items
- You read them all, pick confidently
Scenario B: Menu has 100 items
- You're overwhelmed, decision fatigue sets in
- Might pick wrong section entirely
- Takes longer to decide
LLMs experience the same effect. More tools = more decision overhead.
Real impact:
- 10 tools: 95% tool selection accuracy
- 30 tools: 87% tool selection accuracy
- 50 tools: 78% tool selection accuracy
Cost #3: Processing Latency
More context = more tokens to process = longer time-to-first-token.
Observed latency (Claude Sonnet 3.5):
- 2,000 token context: ~400ms to first token
- 7,000 token context: ~1,000ms to first token
- 20,000 token context: ~2,500ms to first token
User experience: Faster responses feel more natural.
Selective Tool Loading: The Solution
Core idea: Don't register all tools upfront. Load only relevant tools for each request.
Approach #1: Intent-Based Tool Loading
Pattern: Classify user intent, load only relevant tools.
from mcp import MCPServer
class SelectiveMCPServer:
def __init__(self):
self.all_tools = self.load_all_tools()
self.tool_map = {
"order_inquiry": ["lookup_order", "check_shipping_status"],
"refund_request": ["lookup_order", "process_refund", "check_eligibility"],
"product_question": ["search_products", "get_product_details"],
"account_management": ["get_user_profile", "update_preferences"],
"technical_support": ["check_system_status", "create_support_ticket"],
}
def get_tools_for_request(self, user_message):
# Fast intent classification (not LLM, use a classifier)
intent = self.classify_intent(user_message)
# Get relevant tool names
relevant_tool_names = self.tool_map.get(intent, [])
# Load only those tools
relevant_tools = [
tool for tool in self.all_tools
if tool.name in relevant_tool_names
]
return relevant_tools
def classify_intent(self, message):
# Fast classification (keyword matching, small ML model, or simple LLM call)
message_lower = message.lower()
if any(word in message_lower for word in ["order", "tracking", "shipped"]):
return "order_inquiry"
elif any(word in message_lower for word in ["refund", "return", "money back"]):
return "refund_request"
elif any(word in message_lower for word in ["product", "item", "buy"]):
return "product_question"
# ... more rules
return "general" # Default
Usage:
server = SelectiveMCPServer()
# User message comes in
user_message = "Where's my order #12345?"
# Get only relevant tools
tools = server.get_tools_for_request(user_message)
# Returns: [lookup_order, check_shipping_status]
# Create agent with minimal tools
agent = create_agent(tools=tools)
response = agent.run(user_message)
Result: 2 tools (300 tokens) instead of 30 tools (4,500 tokens)
Approach #2: Hierarchical Tool Loading
Pattern: Load core tools first, add specialized tools if needed.
class HierarchicalMCPServer:
def __init__(self):
# Tier 1: Always loaded (core capabilities)
self.core_tools = [
"clarify_question",
"provide_general_info",
"escalate_to_human"
]
# Tier 2: Domain-specific (loaded based on intent)
self.domain_tools = {
"orders": ["lookup_order", "track_shipping"],
"refunds": ["process_refund", "check_eligibility"],
"products": ["search_products", "get_details"],
}
# Tier 3: Specialized (loaded only when explicitly needed)
self.specialized_tools = {
"bulk_operations": ["bulk_import", "bulk_export"],
"reporting": ["generate_report", "export_analytics"],
}
def get_tools_for_request(self, message, conversation_context=None):
tools = self.core_tools.copy()
# Add domain tools based on intent
intent = self.classify_intent(message)
if intent in self.domain_tools:
tools.extend(self.domain_tools[intent])
# Add specialized tools only if conversation indicates need
if conversation_context:
if "bulk" in message.lower():
tools.extend(self.specialized_tools["bulk_operations"])
if "report" in message.lower():
tools.extend(self.specialized_tools["reporting"])
return tools
Benefit: Graceful scaling — start minimal, add only as needed.
Approach #3: Usage-Based Tool Loading
Pattern: Track which tools are actually used, prioritize loading them.
class AdaptiveMCPServer:
def __init__(self):
self.all_tools = self.load_all_tools()
self.tool_usage_stats = self.load_usage_stats()
def get_tools_for_request(self, message, max_tools=10):
# Get intent
intent = self.classify_intent(message)
# Get tools used for this intent historically
relevant_tools = self.tool_usage_stats.get_for_intent(intent)
# Sort by usage frequency
relevant_tools.sort(key=lambda t: t.usage_count, reverse=True)
# Return top N most-used tools
return relevant_tools[:max_tools]
def track_usage(self, tool_name, intent):
# After each request, track which tools were used
self.tool_usage_stats.increment(tool_name, intent)
Benefit: Self-optimizing — automatically loads the right tools based on real usage.
Implementation Patterns
Pattern #1: Intent Router
Architecture:
User message
↓
Intent classifier (fast, 50ms)
↓
Tool selector
↓
MCP server (loads selected tools)
↓
Agent (sees only relevant tools)
Code:
def handle_request(message):
# Step 1: Classify (fast)
intent = intent_classifier.predict(message)
# Step 2: Select tools
tools = tool_selector.get_tools_for_intent(intent)
# Step 3: Create context with minimal tools
mcp_context = create_mcp_context(tools=tools)
# Step 4: Run agent
agent = create_agent(context=mcp_context)
response = agent.run(message)
return response
Pattern #2: Progressive Tool Loading
Architecture:
User message
↓
Agent starts with 3 core tools
↓
If agent says "I need X capability"
↓
Dynamically load additional tools
↓
Agent continues with expanded toolset
Code:
def handle_request_progressive(message):
# Start with core tools
tools = get_core_tools()
agent = create_agent(tools=tools)
response = agent.run(message)
# Check if agent needs more tools
if response.needs_more_tools:
# Load requested capabilities
additional_tools = load_tools(response.requested_capabilities)
agent.add_tools(additional_tools)
# Continue
response = agent.continue_run()
return response
Benefit: Start fast (minimal context), expand only if necessary.
Pattern #3: Context Budget Allocation
Architecture: Allocate a fixed token budget for tools, select greedily.
def select_tools_with_budget(intent, max_tokens=1000):
candidate_tools = get_candidate_tools(intent)
# Sort by relevance score
candidate_tools.sort(key=lambda t: t.relevance_score, reverse=True)
selected_tools = []
tokens_used = 0
for tool in candidate_tools:
tool_token_cost = count_tokens(tool.to_json())
if tokens_used + tool_token_cost <= max_tokens:
selected_tools.append(tool)
tokens_used += tool_token_cost
else:
break # Budget exhausted
return selected_tools
Benefit: Guarantees you never exceed tool metadata budget.
Tool Metadata Optimization
Beyond selective loading, optimize the metadata itself.
Optimization #1: Compress Descriptions
Before (verbose):
Tool(
name="lookup_order",
description="This tool allows you to look up detailed information about a customer order by providing either the order number or the customer's email address. It will return comprehensive details including order status, list of items, shipping information, and payment details."
)
Tokens: ~60
After (compressed):
Tool(
name="lookup_order",
description="Get order details by order number or email. Returns status, items, shipping, payment."
)
Tokens: ~20 (67% reduction)
Optimization #2: Remove Redundancy
Before:
parameters={
"order_id": {
"type": "string",
"description": "The unique order identifier in format ORD-XXXXX",
"required": False,
"example": "ORD-12345"
},
"customer_email": {
"type": "string",
"description": "The email address of the customer who placed the order",
"required": False,
"example": "customer@example.com"
}
}
After:
parameters={
"order_id": {"type": "string", "description": "Order ID (ORD-XXXXX)", "required": False},
"customer_email": {"type": "string", "description": "Customer email", "required": False}
}
Benefit: Clearer, more concise, fewer tokens.
Optimization #3: Group Related Tools
Instead of 5 separate order tools, create one tool with actions:
Before (5 tools):
lookup_orderupdate_ordercancel_ordertrack_orderrefund_order
After (1 tool):
Tool(
name="manage_order",
description="Manage orders: lookup, update, cancel, track, refund",
parameters={
"action": {"type": "string", "enum": ["lookup", "update", "cancel", "track", "refund"]},
"order_id": {"type": "string"},
# ... other params
}
)
Benefit: 1 tool description instead of 5 (80% token reduction).
Measuring Impact
Metrics to track:
1. Tool Context Size
avg_tool_tokens = sum(tool_token_counts) / num_requests
Goal: < 1,000 tokens for most requests
2. Tool Selection Accuracy
accuracy = correct_tool_selections / total_tool_selections
Goal: > 95%
3. Average Tools Loaded Per Request
avg_tools_loaded = sum(tool_counts) / num_requests
Goal: 3-7 tools (not 30)
4. Cost Per Request
cost = (input_tokens * input_price) + (output_tokens * output_price)
Goal: Reduce by 50-70% through selective loading
Common Mistakes
Mistake #1: Over-Engineering Intent Classification
Wrong: Use GPT-4 to classify intent before each request
Right: Use keyword matching or a fast small model (< 50ms)
Why: If classification takes 2 seconds, you haven't saved time.
Mistake #2: Not Tracking Tool Usage
Wrong: Guess which tools are most common
Right: Log usage, optimize based on data
Mistake #3: Loading Tools Too Late
Wrong: Start with 0 tools, wait for agent to request
Right: Start with likely-needed tools based on intent
Why: Better to load 5 tools once than load 1 tool 5 times.
Mistake #4: Ignoring Core Tools
Wrong: Make all tools conditional
Right: Always load 2-3 core tools (clarify, format_response, etc.)
Mistake #5: Not Testing Impact
Wrong: Implement selective loading, assume it works
Right: A/B test, measure tool selection accuracy
The Bottom Line
Tool selection is a context management problem.
Every tool you register adds tokens to the context window. More tokens = more cost, slower responses, worse tool selection.
Solution: Selective tool loading
- Classify intent (fast)
- Load only relevant tools (3-7 instead of 30)
- Measure impact (accuracy, cost, latency)
Expected impact:
- 50-70% token reduction
- 40-60% cost reduction
- 20-30% faster responses
- 5-10% better tool selection accuracy
Start simple: Implement intent-based tool loading for your top 3-5 intents.
Getting Started
Quick implementation (< 2 hours):
-
Audit current tool usage
# Which tools are actually used? tool_usage = analyze_logs() print(tool_usage) -
Group tools by intent
tool_map = { "order_inquiry": ["lookup_order", "track_shipping"], "refund": ["process_refund"], # ... } -
Implement basic intent classifier
def classify(msg): if "order" in msg.lower(): return "order_inquiry" if "refund" in msg.lower(): return "refund" return "general" -
Load tools conditionally
intent = classify(message) tools = tool_map[intent] agent = create_agent(tools=tools) -
Measure before/after
- Context size
- Cost per request
- Tool selection accuracy
Need help optimizing your MCP tool selection strategy? We've helped teams reduce context bloat by 60-80%.
About the Author
DomAIn Labs Team
The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.