How fast can you really build a website?

Our AI-powered process delivers professional websites in just 14 days, compared to the 3-6 months traditional agencies take. We achieve this through AI automation, 24/7 development capabilities, and streamlined processes.

What makes your AI solutions different?

We don't just add AI features - we rebuild your entire digital presence with AI at its core. This means faster delivery, lower costs, better performance, and continuous optimization. Our solutions are custom-built for your specific business needs.

How much does a website redesign cost?

Our website packages start at $2,000 for basic sites and go up to $20,000+ for enterprise solutions. This is 90% less than traditional agencies while delivering better results. All packages include AI optimization and ongoing support.

Do you work with small businesses?

Yes! We work with businesses of all sizes. Our Basic package at $2,000 is perfect for small businesses needing a professional web presence. We also offer flexible payment plans to make AI transformation accessible.

What AI chatbot features do you offer?

Our AI chatbots handle customer service, appointment scheduling, lead qualification, and sales support. They integrate with your existing systems and learn from interactions to improve over time. Plans start at $297/month.

Can you help with SEO and Google Ads?

Absolutely! Our AI-powered SEO starts at $497/month and includes keyword research, content strategy, and continuous optimization. Google Ads management starts at $997/month plus ad spend, with AI optimizing your campaigns 24/7.

Do you offer custom enterprise solutions?

Yes, we create custom AI solutions for enterprises including workflow automation, document processing, predictive analytics, and full digital transformation. Contact us for a custom consultation and quote.

What happens after my website launches?

We provide ongoing support, hosting, and AI-powered optimization. Our AI continuously monitors your site's performance, suggests improvements, and can automatically implement updates to improve conversion rates.

How do I get started?

Simply visit our contact page or click any 'Get Started' button on our site. We'll schedule a free consultation to understand your needs and recommend the best solution. Most projects start within 48 hours of approval.

What if I'm not satisfied with the results?

We offer a 100% satisfaction guarantee. We'll work with you until you're completely happy with the results. Our AI-powered approach allows us to make rapid iterations based on your feedback.

Tool Selection as a Context Problem: Why Less Is More in MCP

Back to all articles

Agent Guides

Tool Selection as a Context Problem: Why Less Is More in MCP

DomAIn Labs Team

June 28, 2025

9 min read

Tool Selection as a Context Problem: Why Less Is More in MCP

You built an MCP server. You're excited. You register all your tools — 30, 40, maybe 50 of them.

Your thinking: "The more tools my agent has access to, the more capable it is!"

Then you notice:

The agent is slower
Tool selection is less accurate
Costs are higher than expected
Sometimes it picks the wrong tool

The problem isn't MCP. The problem is that you're treating tool registration like a feature list, not a context management problem.

Every tool you register adds to the context window. And every token in that window competes for the model's attention.

Let me show you how to fix this with selective tool loading.

The MCP Tool Metadata Problem

When you register a tool via MCP, you're not just giving the agent access to functionality. You're adding text to its context window.

Each tool requires:

Name (5-20 tokens)
Description (20-100 tokens)
Parameter definitions (10-50 tokens per parameter)
Return type documentation (10-30 tokens)
Examples (50-200 tokens, if included)

Example tool definition:

from mcp import Tool

order_lookup_tool = Tool(
    name="lookup_order",
    description="Looks up order details by order number or customer email. Returns order status, items, shipping info, and payment details.",
    parameters={
        "order_id": {
            "type": "string",
            "description": "The order number (format: ORD-XXXXX)",
            "required": False
        },
        "customer_email": {
            "type": "string",
            "description": "Customer's email address",
            "required": False
        }
    },
    returns="OrderDetails object with status, items, shipping, payment info"
)

Token count: ~150 tokens

Now multiply by 30 tools: 4,500 tokens

That's 4,500 tokens the agent processes on every single request, even if it only uses 2-3 tools.

The Hidden Costs

Cost #1: Token Usage

Scenario: Customer support agent with 30 tools

Average request:

Tool metadata: 4,500 tokens
System prompt: 500 tokens
Conversation history: 2,000 tokens
User query: 100 tokens
Total input: 7,100 tokens

Usage: 10,000 requests/day

Monthly cost (Claude Sonnet 3.5):

Input: 7,100 tokens × 10,000 × 30 days = 2.13B tokens
At $0.003 per 1K tokens = $6,390/month

With selective tool loading (5 tools per request):

Tool metadata: 750 tokens
System prompt: 500 tokens
Conversation history: 2,000 tokens
User query: 100 tokens
Total input: 3,350 tokens

Monthly cost:

Input: 3,350 tokens × 10,000 × 30 days = 1.005B tokens
At $0.003 per 1K tokens = $3,015/month

Savings: $3,375/month ($40,500/year)

Cost #2: Attention Dilution

The more tools in context, the harder it is for the model to pick the right one.

Think of it like this:

You're at a restaurant.

Scenario A: Menu has 10 items

You read them all, pick confidently

Scenario B: Menu has 100 items

You're overwhelmed, decision fatigue sets in
Might pick wrong section entirely
Takes longer to decide

LLMs experience the same effect. More tools = more decision overhead.

Real impact:

10 tools: 95% tool selection accuracy
30 tools: 87% tool selection accuracy
50 tools: 78% tool selection accuracy

Cost #3: Processing Latency

More context = more tokens to process = longer time-to-first-token.

Observed latency (Claude Sonnet 3.5):

2,000 token context: ~400ms to first token
7,000 token context: ~1,000ms to first token
20,000 token context: ~2,500ms to first token

User experience: Faster responses feel more natural.

Selective Tool Loading: The Solution

Core idea: Don't register all tools upfront. Load only relevant tools for each request.

Approach #1: Intent-Based Tool Loading

Pattern: Classify user intent, load only relevant tools.

from mcp import MCPServer

class SelectiveMCPServer:
    def __init__(self):
        self.all_tools = self.load_all_tools()
        self.tool_map = {
            "order_inquiry": ["lookup_order", "check_shipping_status"],
            "refund_request": ["lookup_order", "process_refund", "check_eligibility"],
            "product_question": ["search_products", "get_product_details"],
            "account_management": ["get_user_profile", "update_preferences"],
            "technical_support": ["check_system_status", "create_support_ticket"],
        }

    def get_tools_for_request(self, user_message):
        # Fast intent classification (not LLM, use a classifier)
        intent = self.classify_intent(user_message)

        # Get relevant tool names
        relevant_tool_names = self.tool_map.get(intent, [])

        # Load only those tools
        relevant_tools = [
            tool for tool in self.all_tools
            if tool.name in relevant_tool_names
        ]

        return relevant_tools

    def classify_intent(self, message):
        # Fast classification (keyword matching, small ML model, or simple LLM call)
        message_lower = message.lower()

        if any(word in message_lower for word in ["order", "tracking", "shipped"]):
            return "order_inquiry"
        elif any(word in message_lower for word in ["refund", "return", "money back"]):
            return "refund_request"
        elif any(word in message_lower for word in ["product", "item", "buy"]):
            return "product_question"
        # ... more rules

        return "general"  # Default

Usage:

server = SelectiveMCPServer()

# User message comes in
user_message = "Where's my order #12345?"

# Get only relevant tools
tools = server.get_tools_for_request(user_message)
# Returns: [lookup_order, check_shipping_status]

# Create agent with minimal tools
agent = create_agent(tools=tools)
response = agent.run(user_message)

Result: 2 tools (300 tokens) instead of 30 tools (4,500 tokens)

Approach #2: Hierarchical Tool Loading

Pattern: Load core tools first, add specialized tools if needed.

class HierarchicalMCPServer:
    def __init__(self):
        # Tier 1: Always loaded (core capabilities)
        self.core_tools = [
            "clarify_question",
            "provide_general_info",
            "escalate_to_human"
        ]

        # Tier 2: Domain-specific (loaded based on intent)
        self.domain_tools = {
            "orders": ["lookup_order", "track_shipping"],
            "refunds": ["process_refund", "check_eligibility"],
            "products": ["search_products", "get_details"],
        }

        # Tier 3: Specialized (loaded only when explicitly needed)
        self.specialized_tools = {
            "bulk_operations": ["bulk_import", "bulk_export"],
            "reporting": ["generate_report", "export_analytics"],
        }

    def get_tools_for_request(self, message, conversation_context=None):
        tools = self.core_tools.copy()

        # Add domain tools based on intent
        intent = self.classify_intent(message)
        if intent in self.domain_tools:
            tools.extend(self.domain_tools[intent])

        # Add specialized tools only if conversation indicates need
        if conversation_context:
            if "bulk" in message.lower():
                tools.extend(self.specialized_tools["bulk_operations"])
            if "report" in message.lower():
                tools.extend(self.specialized_tools["reporting"])

        return tools

Benefit: Graceful scaling — start minimal, add only as needed.

Approach #3: Usage-Based Tool Loading

Pattern: Track which tools are actually used, prioritize loading them.

class AdaptiveMCPServer:
    def __init__(self):
        self.all_tools = self.load_all_tools()
        self.tool_usage_stats = self.load_usage_stats()

    def get_tools_for_request(self, message, max_tools=10):
        # Get intent
        intent = self.classify_intent(message)

        # Get tools used for this intent historically
        relevant_tools = self.tool_usage_stats.get_for_intent(intent)

        # Sort by usage frequency
        relevant_tools.sort(key=lambda t: t.usage_count, reverse=True)

        # Return top N most-used tools
        return relevant_tools[:max_tools]

    def track_usage(self, tool_name, intent):
        # After each request, track which tools were used
        self.tool_usage_stats.increment(tool_name, intent)

Benefit: Self-optimizing — automatically loads the right tools based on real usage.

Implementation Patterns

Pattern #1: Intent Router

Architecture:

User message
    ↓
Intent classifier (fast, 50ms)
    ↓
Tool selector
    ↓
MCP server (loads selected tools)
    ↓
Agent (sees only relevant tools)

Code:

def handle_request(message):
    # Step 1: Classify (fast)
    intent = intent_classifier.predict(message)

    # Step 2: Select tools
    tools = tool_selector.get_tools_for_intent(intent)

    # Step 3: Create context with minimal tools
    mcp_context = create_mcp_context(tools=tools)

    # Step 4: Run agent
    agent = create_agent(context=mcp_context)
    response = agent.run(message)

    return response

Pattern #2: Progressive Tool Loading

Architecture:

User message
    ↓
Agent starts with 3 core tools
    ↓
If agent says "I need X capability"
    ↓
Dynamically load additional tools
    ↓
Agent continues with expanded toolset

Code:

def handle_request_progressive(message):
    # Start with core tools
    tools = get_core_tools()
    agent = create_agent(tools=tools)

    response = agent.run(message)

    # Check if agent needs more tools
    if response.needs_more_tools:
        # Load requested capabilities
        additional_tools = load_tools(response.requested_capabilities)
        agent.add_tools(additional_tools)

        # Continue
        response = agent.continue_run()

    return response

Benefit: Start fast (minimal context), expand only if necessary.

Pattern #3: Context Budget Allocation

Architecture: Allocate a fixed token budget for tools, select greedily.

def select_tools_with_budget(intent, max_tokens=1000):
    candidate_tools = get_candidate_tools(intent)

    # Sort by relevance score
    candidate_tools.sort(key=lambda t: t.relevance_score, reverse=True)

    selected_tools = []
    tokens_used = 0

    for tool in candidate_tools:
        tool_token_cost = count_tokens(tool.to_json())

        if tokens_used + tool_token_cost <= max_tokens:
            selected_tools.append(tool)
            tokens_used += tool_token_cost
        else:
            break  # Budget exhausted

    return selected_tools

Benefit: Guarantees you never exceed tool metadata budget.

Tool Metadata Optimization

Beyond selective loading, optimize the metadata itself.

Optimization #1: Compress Descriptions

Before (verbose):

Tool(
    name="lookup_order",
    description="This tool allows you to look up detailed information about a customer order by providing either the order number or the customer's email address. It will return comprehensive details including order status, list of items, shipping information, and payment details."
)

Tokens: ~60

After (compressed):

Tool(
    name="lookup_order",
    description="Get order details by order number or email. Returns status, items, shipping, payment."
)

Tokens: ~20 (67% reduction)

Optimization #2: Remove Redundancy

Before:

parameters={
    "order_id": {
        "type": "string",
        "description": "The unique order identifier in format ORD-XXXXX",
        "required": False,
        "example": "ORD-12345"
    },
    "customer_email": {
        "type": "string",
        "description": "The email address of the customer who placed the order",
        "required": False,
        "example": "customer@example.com"
    }
}

After:

parameters={
    "order_id": {"type": "string", "description": "Order ID (ORD-XXXXX)", "required": False},
    "customer_email": {"type": "string", "description": "Customer email", "required": False}
}

Benefit: Clearer, more concise, fewer tokens.

Optimization #3: Group Related Tools

Instead of 5 separate order tools, create one tool with actions:

Before (5 tools):

lookup_order
update_order
cancel_order
track_order
refund_order

After (1 tool):

Tool(
    name="manage_order",
    description="Manage orders: lookup, update, cancel, track, refund",
    parameters={
        "action": {"type": "string", "enum": ["lookup", "update", "cancel", "track", "refund"]},
        "order_id": {"type": "string"},
        # ... other params
    }
)

Benefit: 1 tool description instead of 5 (80% token reduction).

Measuring Impact

Metrics to track:

1. Tool Context Size

avg_tool_tokens = sum(tool_token_counts) / num_requests

Goal: < 1,000 tokens for most requests

2. Tool Selection Accuracy

accuracy = correct_tool_selections / total_tool_selections

Goal: > 95%

3. Average Tools Loaded Per Request

avg_tools_loaded = sum(tool_counts) / num_requests

Goal: 3-7 tools (not 30)

4. Cost Per Request

cost = (input_tokens * input_price) + (output_tokens * output_price)

Goal: Reduce by 50-70% through selective loading

Common Mistakes

Mistake #1: Over-Engineering Intent Classification

Wrong: Use GPT-4 to classify intent before each request

Right: Use keyword matching or a fast small model (< 50ms)

Why: If classification takes 2 seconds, you haven't saved time.

Mistake #2: Not Tracking Tool Usage

Wrong: Guess which tools are most common

Right: Log usage, optimize based on data

Mistake #3: Loading Tools Too Late

Wrong: Start with 0 tools, wait for agent to request

Right: Start with likely-needed tools based on intent

Why: Better to load 5 tools once than load 1 tool 5 times.

Mistake #4: Ignoring Core Tools

Wrong: Make all tools conditional

Right: Always load 2-3 core tools (clarify, format_response, etc.)

Mistake #5: Not Testing Impact

Wrong: Implement selective loading, assume it works

Right: A/B test, measure tool selection accuracy

The Bottom Line

Tool selection is a context management problem.

Every tool you register adds tokens to the context window. More tokens = more cost, slower responses, worse tool selection.

Solution: Selective tool loading

Classify intent (fast)
Load only relevant tools (3-7 instead of 30)
Measure impact (accuracy, cost, latency)

Expected impact:

50-70% token reduction
40-60% cost reduction
20-30% faster responses
5-10% better tool selection accuracy

Start simple: Implement intent-based tool loading for your top 3-5 intents.

Getting Started

Quick implementation (< 2 hours):

Audit current tool usage

# Which tools are actually used?
tool_usage = analyze_logs()
print(tool_usage)

Group tools by intent

tool_map = {
    "order_inquiry": ["lookup_order", "track_shipping"],
    "refund": ["process_refund"],
    # ...
}

Implement basic intent classifier

def classify(msg):
    if "order" in msg.lower(): return "order_inquiry"
    if "refund" in msg.lower(): return "refund"
    return "general"

Load tools conditionally

intent = classify(message)
tools = tool_map[intent]
agent = create_agent(tools=tools)

Measure before/after
- Context size
- Cost per request
- Tool selection accuracy

Need help optimizing your MCP tool selection strategy? We've helped teams reduce context bloat by 60-80%.

Get an MCP optimization consultation →

Tags:MCPTool ManagementContext EngineeringOptimization

About the Author

DomAIn Labs Team

The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.

Tool Selection as a Context Problem: Why Less Is More in MCP

Tool Selection as a Context Problem: Why Less Is More in MCP

The MCP Tool Metadata Problem

The Hidden Costs

Cost #1: Token Usage

Cost #2: Attention Dilution

Cost #3: Processing Latency

Selective Tool Loading: The Solution

Approach #1: Intent-Based Tool Loading

Approach #2: Hierarchical Tool Loading

Approach #3: Usage-Based Tool Loading

Implementation Patterns

Pattern #1: Intent Router

Pattern #2: Progressive Tool Loading

Pattern #3: Context Budget Allocation

Tool Metadata Optimization

Optimization #1: Compress Descriptions

Optimization #2: Remove Redundancy

Optimization #3: Group Related Tools

Measuring Impact

1. Tool Context Size

2. Tool Selection Accuracy

3. Average Tools Loaded Per Request

4. Cost Per Request

Common Mistakes

Mistake #1: Over-Engineering Intent Classification

Mistake #2: Not Tracking Tool Usage

Mistake #3: Loading Tools Too Late

Mistake #4: Ignoring Core Tools

Mistake #5: Not Testing Impact

The Bottom Line

Getting Started

About the Author

Related Articles

MCP Isn't Dead, But Bloated Agentic Workflows Are

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

Building a Claude Agent on Railway + Supabase in 20 Minutes