
MCP Isn't Dead, But Bloated Agentic Workflows Are
MCP Isn't Dead, But Bloated Agentic Workflows Are
Hot take: MCP isn't dead, but bloated agentic workflows are.
In 2024, Anthropic's Model Context Protocol (MCP) felt like the USB-C of AI agents — one plug to access every external tool. For a while, it was magic.
But 2025 revealed the cracks. Token bloat. Context rot. Non-deterministic chaos. And a growing realization: the fewer tokens your LLM sees, the sharper it gets.
If you're still loading 30 tools into your agent's context window and wondering why performance is degrading, this post is for you.
The MCP Promise: 2024 in Retrospect
When Anthropic introduced MCP in late 2024, it solved a real problem: tool fragmentation.
Before MCP, every AI agent framework had its own way of defining tools:
- LangChain used
Toolclasses - AutoGen had function schemas
- CrewAI wrapped everything in custom decorators
MCP standardized this. One protocol. One way to expose tools to LLMs. Connect once, use everywhere.
The vision was beautiful:
# Register 30 tools via MCP
mcp_server = MCPServer()
mcp_server.register_tools([
filesystem_tools,
database_tools,
api_tools,
email_tools,
calendar_tools,
# ... 25 more
])
# Agent magically knows about all of them
agent = create_agent(tools=mcp_server.get_tools())
Developers loved it. Load all your tools once. Let the LLM figure out which ones to use. Ship fast.
The Three Cracks: Why 2025 Changed Everything
Crack #1: Token Bloat
Here's the problem nobody talked about in 2024: tool metadata eats tokens.
Every tool you register needs a description. Every parameter needs documentation. Every return type needs explanation. That's how the LLM knows what tools exist and how to use them.
Real numbers:
- 1 simple tool = ~50-150 tokens of metadata
- 30 tools = 1,500-4,500 tokens
- Add error handling specs = +500 tokens
- Add examples for complex tools = +1,000 tokens
You're burning 3,000-5,000 tokens before the agent does anything useful.
On a 200K token context window, that's 2.5% gone. On a 128K window with a long conversation history, you're now context-constrained before you even start the real work.
And that's before the agent starts actually calling tools and accumulating results in context.
Crack #2: Context Rot
Research from Chroma and others showed something critical in early 2025: LLM performance degrades as context fills up.
It's not just about hitting the limit. It's about information density.
When you load 30 tool definitions into context:
- The LLM has to parse all of them on every turn
- Irrelevant tools create noise
- Tool selection becomes a multi-step reasoning problem
- The model wastes inference cycles considering tools it will never use
Real example: A customer service agent that has access to:
- Order lookup ✅ (uses frequently)
- Refund processing ✅ (uses sometimes)
- Database backup tools ❌ (never uses)
- PDF generation ❌ (never uses)
- Email marketing tools ❌ (never uses)
- 25 other tools ❌ (never uses)
Every time the customer asks "Where's my order?", the agent considers all 30 tools. That's wasted reasoning capacity.
The Anthropic team calls this "context rot" — performance degradation from stale, irrelevant information cluttering the context window.
Crack #3: Non-Deterministic Chaos
This is the big one. When everything happens inside the context window, debugging becomes a nightmare:
The problems:
- Tool selection is non-deterministic (LLM decides what to call)
- Output handling happens in-context (LLM parses results)
- State tracking lives in conversation history (grows unbounded)
- Error recovery is LLM-driven (inconsistent behavior)
What this looks like in production:
# Same input, three different tool sequences
Run 1: lookup_order → format_response → done
Run 2: lookup_order → check_inventory → format_response → done
Run 3: search_customer → lookup_order → verify_address → format_response → done
All three "work", but they have different costs, latencies, and failure modes. Good luck debugging that in production at 3am.
The New Wave: Skills-First Architecture
Here's what's replacing bloated MCP workflows in 2025.
What Are Skills?
Skills are modular, self-contained capabilities that bundle tools with their own context.
Instead of loading 30 global tools, you load 2-3 relevant Skills. Each Skill:
- Has its own focused tool set
- Manages its own state externally
- Provides deterministic scaffolding
- Only adds context when invoked
Key difference: Skills are conditionally loaded, not globally registered.
Architecture Comparison
Old way (Bloated MCP):
# All tools loaded globally
agent = Agent(
tools=[
order_lookup, refund_process, inventory_check,
email_send, pdf_generate, database_backup,
# ... 24 more tools
]
)
# Agent decides what to use on every message
response = agent.run("Where's my order #12345?")
# Context: 3,500 tokens of tool metadata + conversation
New way (Skills-First):
# Agent starts with NO tools
agent = Agent(tools=[])
# Skills loaded conditionally
if intent == "order_inquiry":
agent.load_skill("order_management") # 3 tools, 200 tokens
elif intent == "refund_request":
agent.load_skill("refund_processing") # 2 tools, 150 tokens
response = agent.run("Where's my order #12345?")
# Context: 200 tokens + conversation (94% reduction)
Real Token Comparison
Let's measure actual context usage:
| Approach | Tool Metadata | Conversation | Total Context | % Saved |
|---|---|---|---|---|
| Bloated MCP (30 tools) | 3,500 tokens | 2,000 tokens | 5,500 tokens | 0% |
| Skills (3 tools) | 200 tokens | 2,000 tokens | 2,200 tokens | 60% |
| Skills + Pruning | 200 tokens | 1,200 tokens | 1,400 tokens | 75% |
That's not just cost savings. That's performance improvement from reduced context rot.
Externalized State
Skills push state management outside the context window:
class OrderManagementSkill:
def __init__(self):
self.state_store = RedisStore() # External state
self.tools = [
self.lookup_order,
self.check_status,
self.update_tracking
]
def lookup_order(self, order_id: str):
# Deterministic lookup
order = db.query(order_id)
# Store state externally
self.state_store.set(f"order:{order_id}", order.to_dict())
# Return minimal context to LLM
return {
"order_id": order_id,
"status": order.status,
"summary": order.summary() # Compressed representation
}
The LLM sees a summary, not raw data. State lives in Redis, not context.
Lean Deterministic Workflows
Skills enable hybrid workflows: deterministic scaffolding with LLM-enhanced steps.
def handle_refund_request(order_id: str, reason: str):
# Step 1: Deterministic validation
order = validate_order(order_id)
if not order.refundable:
return "Order not eligible for refund"
# Step 2: LLM-enhanced analysis (scoped context)
analysis = llm.analyze(
prompt=f"Analyze refund reason: {reason}",
context=order.summary(), # Minimal context
skill="refund_analysis" # Scoped skill
)
# Step 3: Deterministic execution
if analysis.approved:
refund = process_refund(order_id)
send_confirmation(order.customer_email)
return f"Refund processed: {refund.id}"
else:
return f"Refund denied: {analysis.reason}"
Benefits:
- Predictable execution flow
- LLM only used where needed
- Minimal context per step
- Easy to debug and test
When to Use What: Decision Framework
This isn't "MCP is bad." It's "MCP, used poorly, costs you context and control."
Here's when to use each approach:
Use Full MCP When:
- Building proof-of-concept agents
- Exploring what tools your agent needs
- Working with < 10 well-defined tools
- Context window isn't a constraint
- Non-determinism is acceptable
Use Skills-First When:
- Agent has > 10 potential tools
- Context efficiency matters
- Production reliability is critical
- You need predictable behavior
- Cost optimization is important
Use Hybrid Approach When:
- Some tools are always relevant (global MCP)
- Others are conditionally needed (Skills)
- You need both flexibility and efficiency
Example hybrid:
# Always available (lightweight)
global_tools = [
clarification_tool, # 50 tokens
format_response # 40 tokens
]
# Conditionally loaded
skills = {
"order_management": OrderSkill(),
"refund_processing": RefundSkill(),
"inventory": InventorySkill()
}
agent = Agent(
tools=global_tools,
skills=skills,
skill_loader=dynamic_skill_loader
)
Migration Guide: From Bloated to Lean
If you have an existing MCP agent and want to optimize, here's the path:
Step 1: Audit Tool Usage
# Add logging to track which tools are actually used
@track_tool_usage
def agent_run(message):
return agent.run(message)
# After 100 runs, analyze
tool_usage = get_tool_stats()
# Output:
# order_lookup: 87 times
# refund_process: 23 times
# pdf_generate: 0 times ❌
# database_backup: 0 times ❌
Remove tools with < 5% usage. Those are context noise.
Step 2: Cluster Tools into Skills
Group related tools:
# Before: 30 flat tools
tools = [t1, t2, t3, ..., t30]
# After: 5 Skills with 2-4 tools each
skills = {
"order_ops": [order_lookup, order_update, order_cancel],
"refunds": [refund_initiate, refund_status],
"inventory": [check_stock, reserve_item],
"customer": [get_customer, update_customer],
"notifications": [send_email, send_sms]
}
Step 3: Add Intent Detection
Route to Skills based on user intent:
def route_to_skill(message: str):
# Fast intent classifier (not LLM)
intent = classify_intent(message)
skill_map = {
"order_inquiry": "order_ops",
"refund_request": "refunds",
"stock_check": "inventory"
}
return skill_map.get(intent, "general")
Step 4: Implement Skill Loading
class SkillfulAgent:
def __init__(self):
self.active_skills = []
self.all_skills = load_all_skills()
def run(self, message: str):
# Detect needed skills
needed_skills = self.detect_skills(message)
# Load only what's needed
self.active_skills = [
self.all_skills[name] for name in needed_skills
]
# Get tools from active skills only
tools = self.get_active_tools()
# Run with minimal context
return self.agent.run(message, tools=tools)
Step 5: Measure Improvement
Track before/after metrics:
- Average context size
- Token cost per request
- Response latency
- Tool selection accuracy
- Error rates
Typical results:
- Context size: -60% to -80%
- Cost per request: -50% to -70%
- Latency: -20% to -40% (less reasoning overhead)
- Accuracy: +10% to +20% (less context rot)
The Bottom Line
MCP standardized how we expose tools to LLMs. That was important. But loading every tool into every conversation was never sustainable.
The 2025 pattern:
- Skills-first architecture: Modular capabilities loaded on demand
- Externalized state: Keep conversation history lean
- Hybrid workflows: Deterministic scaffolding + LLM reasoning where it matters
- Context efficiency: The fewer tokens your LLM sees, the sharper it performs
This isn't about replacing MCP. It's about using it smarter — with Skills, selective loading, and deterministic workflows that keep context windows lean and agent behavior predictable.
Common Mistakes to Avoid
- Loading all tools "just in case": Start minimal, add as needed
- Ignoring context metrics: Track token usage per request
- Over-relying on LLM reasoning: Use deterministic logic where possible
- Keeping full conversation history: Implement context pruning
- Not measuring tool usage: You can't optimize what you don't measure
Getting Started
Want to audit your agent's context usage and identify optimization opportunities?
Quick audit checklist:
- Count total tools registered in your agent
- Measure context size per request (tools + conversation)
- Track which tools are used in production (usage %)
- Identify clusters of related tools (potential Skills)
- Estimate token savings from Skills migration
Need help optimizing your agent architecture? We've helped teams reduce token costs by 60-80% while improving performance.
Talk to us about agent optimization →
Related reading:
- Anthropic's Skills announcement: https://www.anthropic.com/news/equipping-agents-for-the-real-world-with-agent-skills
- Chroma's research on context rot: https://research.trychroma.com/context-rot
- LangGraph workflow patterns: https://www.langgraph.dev
About the Author
DomAIn Labs Team
The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.