MCP Isn't Dead, But Bloated Agentic Workflows Are
Agent Guides

MCP Isn't Dead, But Bloated Agentic Workflows Are

DomAIn Labs Team
November 8, 2025
10 min read

MCP Isn't Dead, But Bloated Agentic Workflows Are

Hot take: MCP isn't dead, but bloated agentic workflows are.

In 2024, Anthropic's Model Context Protocol (MCP) felt like the USB-C of AI agents — one plug to access every external tool. For a while, it was magic.

But 2025 revealed the cracks. Token bloat. Context rot. Non-deterministic chaos. And a growing realization: the fewer tokens your LLM sees, the sharper it gets.

If you're still loading 30 tools into your agent's context window and wondering why performance is degrading, this post is for you.

The MCP Promise: 2024 in Retrospect

When Anthropic introduced MCP in late 2024, it solved a real problem: tool fragmentation.

Before MCP, every AI agent framework had its own way of defining tools:

  • LangChain used Tool classes
  • AutoGen had function schemas
  • CrewAI wrapped everything in custom decorators

MCP standardized this. One protocol. One way to expose tools to LLMs. Connect once, use everywhere.

The vision was beautiful:

# Register 30 tools via MCP
mcp_server = MCPServer()
mcp_server.register_tools([
    filesystem_tools,
    database_tools,
    api_tools,
    email_tools,
    calendar_tools,
    # ... 25 more
])

# Agent magically knows about all of them
agent = create_agent(tools=mcp_server.get_tools())

Developers loved it. Load all your tools once. Let the LLM figure out which ones to use. Ship fast.

The Three Cracks: Why 2025 Changed Everything

Crack #1: Token Bloat

Here's the problem nobody talked about in 2024: tool metadata eats tokens.

Every tool you register needs a description. Every parameter needs documentation. Every return type needs explanation. That's how the LLM knows what tools exist and how to use them.

Real numbers:

  • 1 simple tool = ~50-150 tokens of metadata
  • 30 tools = 1,500-4,500 tokens
  • Add error handling specs = +500 tokens
  • Add examples for complex tools = +1,000 tokens

You're burning 3,000-5,000 tokens before the agent does anything useful.

On a 200K token context window, that's 2.5% gone. On a 128K window with a long conversation history, you're now context-constrained before you even start the real work.

And that's before the agent starts actually calling tools and accumulating results in context.

Crack #2: Context Rot

Research from Chroma and others showed something critical in early 2025: LLM performance degrades as context fills up.

It's not just about hitting the limit. It's about information density.

When you load 30 tool definitions into context:

  • The LLM has to parse all of them on every turn
  • Irrelevant tools create noise
  • Tool selection becomes a multi-step reasoning problem
  • The model wastes inference cycles considering tools it will never use

Real example: A customer service agent that has access to:

  • Order lookup ✅ (uses frequently)
  • Refund processing ✅ (uses sometimes)
  • Database backup tools ❌ (never uses)
  • PDF generation ❌ (never uses)
  • Email marketing tools ❌ (never uses)
  • 25 other tools ❌ (never uses)

Every time the customer asks "Where's my order?", the agent considers all 30 tools. That's wasted reasoning capacity.

The Anthropic team calls this "context rot" — performance degradation from stale, irrelevant information cluttering the context window.

Crack #3: Non-Deterministic Chaos

This is the big one. When everything happens inside the context window, debugging becomes a nightmare:

The problems:

  • Tool selection is non-deterministic (LLM decides what to call)
  • Output handling happens in-context (LLM parses results)
  • State tracking lives in conversation history (grows unbounded)
  • Error recovery is LLM-driven (inconsistent behavior)

What this looks like in production:

# Same input, three different tool sequences
Run 1: lookup_order → format_response → done
Run 2: lookup_order → check_inventory → format_response → done
Run 3: search_customer → lookup_order → verify_address → format_response → done

All three "work", but they have different costs, latencies, and failure modes. Good luck debugging that in production at 3am.

The New Wave: Skills-First Architecture

Here's what's replacing bloated MCP workflows in 2025.

What Are Skills?

Skills are modular, self-contained capabilities that bundle tools with their own context.

Instead of loading 30 global tools, you load 2-3 relevant Skills. Each Skill:

  • Has its own focused tool set
  • Manages its own state externally
  • Provides deterministic scaffolding
  • Only adds context when invoked

Key difference: Skills are conditionally loaded, not globally registered.

Architecture Comparison

Old way (Bloated MCP):

# All tools loaded globally
agent = Agent(
    tools=[
        order_lookup, refund_process, inventory_check,
        email_send, pdf_generate, database_backup,
        # ... 24 more tools
    ]
)

# Agent decides what to use on every message
response = agent.run("Where's my order #12345?")
# Context: 3,500 tokens of tool metadata + conversation

New way (Skills-First):

# Agent starts with NO tools
agent = Agent(tools=[])

# Skills loaded conditionally
if intent == "order_inquiry":
    agent.load_skill("order_management")  # 3 tools, 200 tokens
elif intent == "refund_request":
    agent.load_skill("refund_processing")  # 2 tools, 150 tokens

response = agent.run("Where's my order #12345?")
# Context: 200 tokens + conversation (94% reduction)

Real Token Comparison

Let's measure actual context usage:

ApproachTool MetadataConversationTotal Context% Saved
Bloated MCP (30 tools)3,500 tokens2,000 tokens5,500 tokens0%
Skills (3 tools)200 tokens2,000 tokens2,200 tokens60%
Skills + Pruning200 tokens1,200 tokens1,400 tokens75%

That's not just cost savings. That's performance improvement from reduced context rot.

Externalized State

Skills push state management outside the context window:

class OrderManagementSkill:
    def __init__(self):
        self.state_store = RedisStore()  # External state
        self.tools = [
            self.lookup_order,
            self.check_status,
            self.update_tracking
        ]

    def lookup_order(self, order_id: str):
        # Deterministic lookup
        order = db.query(order_id)

        # Store state externally
        self.state_store.set(f"order:{order_id}", order.to_dict())

        # Return minimal context to LLM
        return {
            "order_id": order_id,
            "status": order.status,
            "summary": order.summary()  # Compressed representation
        }

The LLM sees a summary, not raw data. State lives in Redis, not context.

Lean Deterministic Workflows

Skills enable hybrid workflows: deterministic scaffolding with LLM-enhanced steps.

def handle_refund_request(order_id: str, reason: str):
    # Step 1: Deterministic validation
    order = validate_order(order_id)
    if not order.refundable:
        return "Order not eligible for refund"

    # Step 2: LLM-enhanced analysis (scoped context)
    analysis = llm.analyze(
        prompt=f"Analyze refund reason: {reason}",
        context=order.summary(),  # Minimal context
        skill="refund_analysis"   # Scoped skill
    )

    # Step 3: Deterministic execution
    if analysis.approved:
        refund = process_refund(order_id)
        send_confirmation(order.customer_email)
        return f"Refund processed: {refund.id}"
    else:
        return f"Refund denied: {analysis.reason}"

Benefits:

  • Predictable execution flow
  • LLM only used where needed
  • Minimal context per step
  • Easy to debug and test

When to Use What: Decision Framework

This isn't "MCP is bad." It's "MCP, used poorly, costs you context and control."

Here's when to use each approach:

Use Full MCP When:

  • Building proof-of-concept agents
  • Exploring what tools your agent needs
  • Working with < 10 well-defined tools
  • Context window isn't a constraint
  • Non-determinism is acceptable

Use Skills-First When:

  • Agent has > 10 potential tools
  • Context efficiency matters
  • Production reliability is critical
  • You need predictable behavior
  • Cost optimization is important

Use Hybrid Approach When:

  • Some tools are always relevant (global MCP)
  • Others are conditionally needed (Skills)
  • You need both flexibility and efficiency

Example hybrid:

# Always available (lightweight)
global_tools = [
    clarification_tool,  # 50 tokens
    format_response      # 40 tokens
]

# Conditionally loaded
skills = {
    "order_management": OrderSkill(),
    "refund_processing": RefundSkill(),
    "inventory": InventorySkill()
}

agent = Agent(
    tools=global_tools,
    skills=skills,
    skill_loader=dynamic_skill_loader
)

Migration Guide: From Bloated to Lean

If you have an existing MCP agent and want to optimize, here's the path:

Step 1: Audit Tool Usage

# Add logging to track which tools are actually used
@track_tool_usage
def agent_run(message):
    return agent.run(message)

# After 100 runs, analyze
tool_usage = get_tool_stats()
# Output:
# order_lookup: 87 times
# refund_process: 23 times
# pdf_generate: 0 times ❌
# database_backup: 0 times ❌

Remove tools with < 5% usage. Those are context noise.

Step 2: Cluster Tools into Skills

Group related tools:

# Before: 30 flat tools
tools = [t1, t2, t3, ..., t30]

# After: 5 Skills with 2-4 tools each
skills = {
    "order_ops": [order_lookup, order_update, order_cancel],
    "refunds": [refund_initiate, refund_status],
    "inventory": [check_stock, reserve_item],
    "customer": [get_customer, update_customer],
    "notifications": [send_email, send_sms]
}

Step 3: Add Intent Detection

Route to Skills based on user intent:

def route_to_skill(message: str):
    # Fast intent classifier (not LLM)
    intent = classify_intent(message)

    skill_map = {
        "order_inquiry": "order_ops",
        "refund_request": "refunds",
        "stock_check": "inventory"
    }

    return skill_map.get(intent, "general")

Step 4: Implement Skill Loading

class SkillfulAgent:
    def __init__(self):
        self.active_skills = []
        self.all_skills = load_all_skills()

    def run(self, message: str):
        # Detect needed skills
        needed_skills = self.detect_skills(message)

        # Load only what's needed
        self.active_skills = [
            self.all_skills[name] for name in needed_skills
        ]

        # Get tools from active skills only
        tools = self.get_active_tools()

        # Run with minimal context
        return self.agent.run(message, tools=tools)

Step 5: Measure Improvement

Track before/after metrics:

  • Average context size
  • Token cost per request
  • Response latency
  • Tool selection accuracy
  • Error rates

Typical results:

  • Context size: -60% to -80%
  • Cost per request: -50% to -70%
  • Latency: -20% to -40% (less reasoning overhead)
  • Accuracy: +10% to +20% (less context rot)

The Bottom Line

MCP standardized how we expose tools to LLMs. That was important. But loading every tool into every conversation was never sustainable.

The 2025 pattern:

  • Skills-first architecture: Modular capabilities loaded on demand
  • Externalized state: Keep conversation history lean
  • Hybrid workflows: Deterministic scaffolding + LLM reasoning where it matters
  • Context efficiency: The fewer tokens your LLM sees, the sharper it performs

This isn't about replacing MCP. It's about using it smarter — with Skills, selective loading, and deterministic workflows that keep context windows lean and agent behavior predictable.

Common Mistakes to Avoid

  1. Loading all tools "just in case": Start minimal, add as needed
  2. Ignoring context metrics: Track token usage per request
  3. Over-relying on LLM reasoning: Use deterministic logic where possible
  4. Keeping full conversation history: Implement context pruning
  5. Not measuring tool usage: You can't optimize what you don't measure

Getting Started

Want to audit your agent's context usage and identify optimization opportunities?

Quick audit checklist:

  • Count total tools registered in your agent
  • Measure context size per request (tools + conversation)
  • Track which tools are used in production (usage %)
  • Identify clusters of related tools (potential Skills)
  • Estimate token savings from Skills migration

Need help optimizing your agent architecture? We've helped teams reduce token costs by 60-80% while improving performance.

Talk to us about agent optimization →


Related reading:

Tags:MCPSkillsAgent ArchitectureContext EngineeringLLMOps

About the Author

DomAIn Labs Team

The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.