Tool Selection as a Context Problem: Why Less Is More in MCP
Agent Guides

Tool Selection as a Context Problem: Why Less Is More in MCP

DomAIn Labs Team
June 28, 2025
9 min read

Tool Selection as a Context Problem: Why Less Is More in MCP

You built an MCP server. You're excited. You register all your tools — 30, 40, maybe 50 of them.

Your thinking: "The more tools my agent has access to, the more capable it is!"

Then you notice:

  • The agent is slower
  • Tool selection is less accurate
  • Costs are higher than expected
  • Sometimes it picks the wrong tool

The problem isn't MCP. The problem is that you're treating tool registration like a feature list, not a context management problem.

Every tool you register adds to the context window. And every token in that window competes for the model's attention.

Let me show you how to fix this with selective tool loading.

The MCP Tool Metadata Problem

When you register a tool via MCP, you're not just giving the agent access to functionality. You're adding text to its context window.

Each tool requires:

  • Name (5-20 tokens)
  • Description (20-100 tokens)
  • Parameter definitions (10-50 tokens per parameter)
  • Return type documentation (10-30 tokens)
  • Examples (50-200 tokens, if included)

Example tool definition:

from mcp import Tool

order_lookup_tool = Tool(
    name="lookup_order",
    description="Looks up order details by order number or customer email. Returns order status, items, shipping info, and payment details.",
    parameters={
        "order_id": {
            "type": "string",
            "description": "The order number (format: ORD-XXXXX)",
            "required": False
        },
        "customer_email": {
            "type": "string",
            "description": "Customer's email address",
            "required": False
        }
    },
    returns="OrderDetails object with status, items, shipping, payment info"
)

Token count: ~150 tokens

Now multiply by 30 tools: 4,500 tokens

That's 4,500 tokens the agent processes on every single request, even if it only uses 2-3 tools.

The Hidden Costs

Cost #1: Token Usage

Scenario: Customer support agent with 30 tools

Average request:

  • Tool metadata: 4,500 tokens
  • System prompt: 500 tokens
  • Conversation history: 2,000 tokens
  • User query: 100 tokens
  • Total input: 7,100 tokens

Usage: 10,000 requests/day

Monthly cost (Claude Sonnet 3.5):

  • Input: 7,100 tokens × 10,000 × 30 days = 2.13B tokens
  • At $0.003 per 1K tokens = $6,390/month

With selective tool loading (5 tools per request):

  • Tool metadata: 750 tokens
  • System prompt: 500 tokens
  • Conversation history: 2,000 tokens
  • User query: 100 tokens
  • Total input: 3,350 tokens

Monthly cost:

  • Input: 3,350 tokens × 10,000 × 30 days = 1.005B tokens
  • At $0.003 per 1K tokens = $3,015/month

Savings: $3,375/month ($40,500/year)

Cost #2: Attention Dilution

The more tools in context, the harder it is for the model to pick the right one.

Think of it like this:

You're at a restaurant.

Scenario A: Menu has 10 items

  • You read them all, pick confidently

Scenario B: Menu has 100 items

  • You're overwhelmed, decision fatigue sets in
  • Might pick wrong section entirely
  • Takes longer to decide

LLMs experience the same effect. More tools = more decision overhead.

Real impact:

  • 10 tools: 95% tool selection accuracy
  • 30 tools: 87% tool selection accuracy
  • 50 tools: 78% tool selection accuracy

Cost #3: Processing Latency

More context = more tokens to process = longer time-to-first-token.

Observed latency (Claude Sonnet 3.5):

  • 2,000 token context: ~400ms to first token
  • 7,000 token context: ~1,000ms to first token
  • 20,000 token context: ~2,500ms to first token

User experience: Faster responses feel more natural.

Selective Tool Loading: The Solution

Core idea: Don't register all tools upfront. Load only relevant tools for each request.

Approach #1: Intent-Based Tool Loading

Pattern: Classify user intent, load only relevant tools.

from mcp import MCPServer

class SelectiveMCPServer:
    def __init__(self):
        self.all_tools = self.load_all_tools()
        self.tool_map = {
            "order_inquiry": ["lookup_order", "check_shipping_status"],
            "refund_request": ["lookup_order", "process_refund", "check_eligibility"],
            "product_question": ["search_products", "get_product_details"],
            "account_management": ["get_user_profile", "update_preferences"],
            "technical_support": ["check_system_status", "create_support_ticket"],
        }

    def get_tools_for_request(self, user_message):
        # Fast intent classification (not LLM, use a classifier)
        intent = self.classify_intent(user_message)

        # Get relevant tool names
        relevant_tool_names = self.tool_map.get(intent, [])

        # Load only those tools
        relevant_tools = [
            tool for tool in self.all_tools
            if tool.name in relevant_tool_names
        ]

        return relevant_tools

    def classify_intent(self, message):
        # Fast classification (keyword matching, small ML model, or simple LLM call)
        message_lower = message.lower()

        if any(word in message_lower for word in ["order", "tracking", "shipped"]):
            return "order_inquiry"
        elif any(word in message_lower for word in ["refund", "return", "money back"]):
            return "refund_request"
        elif any(word in message_lower for word in ["product", "item", "buy"]):
            return "product_question"
        # ... more rules

        return "general"  # Default

Usage:

server = SelectiveMCPServer()

# User message comes in
user_message = "Where's my order #12345?"

# Get only relevant tools
tools = server.get_tools_for_request(user_message)
# Returns: [lookup_order, check_shipping_status]

# Create agent with minimal tools
agent = create_agent(tools=tools)
response = agent.run(user_message)

Result: 2 tools (300 tokens) instead of 30 tools (4,500 tokens)

Approach #2: Hierarchical Tool Loading

Pattern: Load core tools first, add specialized tools if needed.

class HierarchicalMCPServer:
    def __init__(self):
        # Tier 1: Always loaded (core capabilities)
        self.core_tools = [
            "clarify_question",
            "provide_general_info",
            "escalate_to_human"
        ]

        # Tier 2: Domain-specific (loaded based on intent)
        self.domain_tools = {
            "orders": ["lookup_order", "track_shipping"],
            "refunds": ["process_refund", "check_eligibility"],
            "products": ["search_products", "get_details"],
        }

        # Tier 3: Specialized (loaded only when explicitly needed)
        self.specialized_tools = {
            "bulk_operations": ["bulk_import", "bulk_export"],
            "reporting": ["generate_report", "export_analytics"],
        }

    def get_tools_for_request(self, message, conversation_context=None):
        tools = self.core_tools.copy()

        # Add domain tools based on intent
        intent = self.classify_intent(message)
        if intent in self.domain_tools:
            tools.extend(self.domain_tools[intent])

        # Add specialized tools only if conversation indicates need
        if conversation_context:
            if "bulk" in message.lower():
                tools.extend(self.specialized_tools["bulk_operations"])
            if "report" in message.lower():
                tools.extend(self.specialized_tools["reporting"])

        return tools

Benefit: Graceful scaling — start minimal, add only as needed.

Approach #3: Usage-Based Tool Loading

Pattern: Track which tools are actually used, prioritize loading them.

class AdaptiveMCPServer:
    def __init__(self):
        self.all_tools = self.load_all_tools()
        self.tool_usage_stats = self.load_usage_stats()

    def get_tools_for_request(self, message, max_tools=10):
        # Get intent
        intent = self.classify_intent(message)

        # Get tools used for this intent historically
        relevant_tools = self.tool_usage_stats.get_for_intent(intent)

        # Sort by usage frequency
        relevant_tools.sort(key=lambda t: t.usage_count, reverse=True)

        # Return top N most-used tools
        return relevant_tools[:max_tools]

    def track_usage(self, tool_name, intent):
        # After each request, track which tools were used
        self.tool_usage_stats.increment(tool_name, intent)

Benefit: Self-optimizing — automatically loads the right tools based on real usage.

Implementation Patterns

Pattern #1: Intent Router

Architecture:

User message
    ↓
Intent classifier (fast, 50ms)
    ↓
Tool selector
    ↓
MCP server (loads selected tools)
    ↓
Agent (sees only relevant tools)

Code:

def handle_request(message):
    # Step 1: Classify (fast)
    intent = intent_classifier.predict(message)

    # Step 2: Select tools
    tools = tool_selector.get_tools_for_intent(intent)

    # Step 3: Create context with minimal tools
    mcp_context = create_mcp_context(tools=tools)

    # Step 4: Run agent
    agent = create_agent(context=mcp_context)
    response = agent.run(message)

    return response

Pattern #2: Progressive Tool Loading

Architecture:

User message
    ↓
Agent starts with 3 core tools
    ↓
If agent says "I need X capability"
    ↓
Dynamically load additional tools
    ↓
Agent continues with expanded toolset

Code:

def handle_request_progressive(message):
    # Start with core tools
    tools = get_core_tools()
    agent = create_agent(tools=tools)

    response = agent.run(message)

    # Check if agent needs more tools
    if response.needs_more_tools:
        # Load requested capabilities
        additional_tools = load_tools(response.requested_capabilities)
        agent.add_tools(additional_tools)

        # Continue
        response = agent.continue_run()

    return response

Benefit: Start fast (minimal context), expand only if necessary.

Pattern #3: Context Budget Allocation

Architecture: Allocate a fixed token budget for tools, select greedily.

def select_tools_with_budget(intent, max_tokens=1000):
    candidate_tools = get_candidate_tools(intent)

    # Sort by relevance score
    candidate_tools.sort(key=lambda t: t.relevance_score, reverse=True)

    selected_tools = []
    tokens_used = 0

    for tool in candidate_tools:
        tool_token_cost = count_tokens(tool.to_json())

        if tokens_used + tool_token_cost <= max_tokens:
            selected_tools.append(tool)
            tokens_used += tool_token_cost
        else:
            break  # Budget exhausted

    return selected_tools

Benefit: Guarantees you never exceed tool metadata budget.

Tool Metadata Optimization

Beyond selective loading, optimize the metadata itself.

Optimization #1: Compress Descriptions

Before (verbose):

Tool(
    name="lookup_order",
    description="This tool allows you to look up detailed information about a customer order by providing either the order number or the customer's email address. It will return comprehensive details including order status, list of items, shipping information, and payment details."
)

Tokens: ~60

After (compressed):

Tool(
    name="lookup_order",
    description="Get order details by order number or email. Returns status, items, shipping, payment."
)

Tokens: ~20 (67% reduction)

Optimization #2: Remove Redundancy

Before:

parameters={
    "order_id": {
        "type": "string",
        "description": "The unique order identifier in format ORD-XXXXX",
        "required": False,
        "example": "ORD-12345"
    },
    "customer_email": {
        "type": "string",
        "description": "The email address of the customer who placed the order",
        "required": False,
        "example": "customer@example.com"
    }
}

After:

parameters={
    "order_id": {"type": "string", "description": "Order ID (ORD-XXXXX)", "required": False},
    "customer_email": {"type": "string", "description": "Customer email", "required": False}
}

Benefit: Clearer, more concise, fewer tokens.

Optimization #3: Group Related Tools

Instead of 5 separate order tools, create one tool with actions:

Before (5 tools):

  • lookup_order
  • update_order
  • cancel_order
  • track_order
  • refund_order

After (1 tool):

Tool(
    name="manage_order",
    description="Manage orders: lookup, update, cancel, track, refund",
    parameters={
        "action": {"type": "string", "enum": ["lookup", "update", "cancel", "track", "refund"]},
        "order_id": {"type": "string"},
        # ... other params
    }
)

Benefit: 1 tool description instead of 5 (80% token reduction).

Measuring Impact

Metrics to track:

1. Tool Context Size

avg_tool_tokens = sum(tool_token_counts) / num_requests

Goal: < 1,000 tokens for most requests

2. Tool Selection Accuracy

accuracy = correct_tool_selections / total_tool_selections

Goal: > 95%

3. Average Tools Loaded Per Request

avg_tools_loaded = sum(tool_counts) / num_requests

Goal: 3-7 tools (not 30)

4. Cost Per Request

cost = (input_tokens * input_price) + (output_tokens * output_price)

Goal: Reduce by 50-70% through selective loading

Common Mistakes

Mistake #1: Over-Engineering Intent Classification

Wrong: Use GPT-4 to classify intent before each request

Right: Use keyword matching or a fast small model (< 50ms)

Why: If classification takes 2 seconds, you haven't saved time.

Mistake #2: Not Tracking Tool Usage

Wrong: Guess which tools are most common

Right: Log usage, optimize based on data

Mistake #3: Loading Tools Too Late

Wrong: Start with 0 tools, wait for agent to request

Right: Start with likely-needed tools based on intent

Why: Better to load 5 tools once than load 1 tool 5 times.

Mistake #4: Ignoring Core Tools

Wrong: Make all tools conditional

Right: Always load 2-3 core tools (clarify, format_response, etc.)

Mistake #5: Not Testing Impact

Wrong: Implement selective loading, assume it works

Right: A/B test, measure tool selection accuracy

The Bottom Line

Tool selection is a context management problem.

Every tool you register adds tokens to the context window. More tokens = more cost, slower responses, worse tool selection.

Solution: Selective tool loading

  • Classify intent (fast)
  • Load only relevant tools (3-7 instead of 30)
  • Measure impact (accuracy, cost, latency)

Expected impact:

  • 50-70% token reduction
  • 40-60% cost reduction
  • 20-30% faster responses
  • 5-10% better tool selection accuracy

Start simple: Implement intent-based tool loading for your top 3-5 intents.

Getting Started

Quick implementation (< 2 hours):

  1. Audit current tool usage

    # Which tools are actually used?
    tool_usage = analyze_logs()
    print(tool_usage)
    
  2. Group tools by intent

    tool_map = {
        "order_inquiry": ["lookup_order", "track_shipping"],
        "refund": ["process_refund"],
        # ...
    }
    
  3. Implement basic intent classifier

    def classify(msg):
        if "order" in msg.lower(): return "order_inquiry"
        if "refund" in msg.lower(): return "refund"
        return "general"
    
  4. Load tools conditionally

    intent = classify(message)
    tools = tool_map[intent]
    agent = create_agent(tools=tools)
    
  5. Measure before/after

    • Context size
    • Cost per request
    • Tool selection accuracy

Need help optimizing your MCP tool selection strategy? We've helped teams reduce context bloat by 60-80%.

Get an MCP optimization consultation →

Tags:MCPTool ManagementContext EngineeringOptimization

About the Author

DomAIn Labs Team

The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.