
Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io
Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io
You built an AI agent. Now you need to deploy it.
The old way: Provision servers, manage scaling, configure load balancers, worry about uptime.
The new way: Serverless platforms that handle infrastructure for you.
Let me show you how to deploy AI agents on three popular platforms: Vercel, Railway, and Fly.io.
Platform Comparison
| Feature | Vercel | Railway | Fly.io |
|---|---|---|---|
| Best for | Next.js/React apps | Python/Node backends | Any language, global edge |
| Cold start | 0-3s | 0-2s | 0-1s |
| Max execution | 60s (Hobby), 300s (Pro) | Unlimited | Unlimited |
| Pricing | $20/month | $5/month | $3/month (pay-as-you-go) |
| Auto-scaling | Yes | Yes | Yes |
| Custom domains | Yes (easy) | Yes | Yes |
| Databases | Via partners | Built-in Postgres | Built-in Postgres |
| Edge functions | Yes | No | Yes (global) |
| Docker support | Limited | Yes | Yes |
Option 1: Vercel (Best for Next.js)
Ideal for: Frontend-heavy apps with AI backend
Pros:
- Amazing DX (developer experience)
- Instant deploys from Git
- Edge functions for low latency
- Great Next.js integration
Cons:
- 60s timeout on Hobby plan
- Not great for long-running AI tasks
- More expensive at scale
Deployment Guide: Vercel + Next.js
Project structure:
my-ai-app/
├── app/
│ ├── api/
│ │ └── chat/
│ │ └── route.ts
│ └── page.tsx
├── lib/
│ └── agent.ts
├── package.json
└── vercel.json
Install dependencies:
npm install @anthropic-ai/sdk ai
Create agent (lib/agent.ts):
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
export async function chatWithAgent(message: string) {
const response = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: message }],
});
const textContent = response.content.find((block) => block.type === "text");
return textContent?.text || "No response";
}
Create API route (app/api/chat/route.ts):
import { NextRequest, NextResponse } from "next/server";
import { chatWithAgent } from "@/lib/agent";
export async function POST(request: NextRequest) {
try {
const { message } = await request.json();
if (!message) {
return NextResponse.json(
{ error: "Message required" },
{ status: 400 }
);
}
const response = await chatWithAgent(message);
return NextResponse.json({ response });
} catch (error: any) {
console.error("Chat error:", error);
return NextResponse.json(
{ error: error.message },
{ status: 500 }
);
}
}
export const runtime = "edge"; // Use edge runtime for faster cold starts
Configure Vercel (vercel.json):
{
"functions": {
"app/api/**": {
"maxDuration": 60
}
}
}
Deploy:
# Install Vercel CLI
npm install -g vercel
# Deploy
vercel
# Add environment variables
vercel env add ANTHROPIC_API_KEY
Test:
curl -X POST https://your-app.vercel.app/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello!"}'
Streaming Responses on Vercel
For better UX, stream responses:
// app/api/chat/route.ts
import { StreamingTextResponse, AnthropicStream } from "ai";
export async function POST(request: NextRequest) {
const { message } = await request.json();
const response = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: message }],
stream: true,
});
// Convert Anthropic stream to Vercel AI SDK stream
const stream = AnthropicStream(response);
return new StreamingTextResponse(stream);
}
Frontend:
// app/page.tsx
"use client";
import { useChat } from "ai/react";
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();
return (
<div>
{messages.map((m) => (
<div key={m.id}>
{m.role}: {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
<button type="submit">Send</button>
</form>
</div>
);
}
Option 2: Railway (Best for Python Backends)
Ideal for: Python agents, LangChain, complex backends
Pros:
- No timeout limits
- Built-in Postgres
- Great for Python/Node
- Simple pricing
Cons:
- Not edge-optimized
- Fewer integrations than Vercel
Deployment Guide: Railway + Python + Flask
Project structure:
ai-agent/
├── agent.py
├── app.py
├── requirements.txt
├── Procfile
└── railway.toml
Create agent (agent.py):
import os
from anthropic import Anthropic
class AIAgent:
def __init__(self):
self.anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
def chat(self, message: str) -> str:
response = self.anthropic.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": message}]
)
return response.content[0].text
Create Flask app (app.py):
import os
from flask import Flask, request, jsonify
from flask_cors import CORS
from agent import AIAgent
app = Flask(__name__)
CORS(app)
agent = AIAgent()
@app.route("/")
def home():
return {"status": "AI Agent running on Railway"}
@app.route("/health")
def health():
return {"status": "healthy"}
@app.route("/chat", methods=["POST"])
def chat():
try:
data = request.get_json()
message = data.get("message")
if not message:
return jsonify({"error": "Message required"}), 400
response = agent.chat(message)
return jsonify({"response": response})
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == "__main__":
port = int(os.getenv("PORT", 8080))
app.run(host="0.0.0.0", port=port)
Dependencies (requirements.txt):
flask==3.0.2
flask-cors==4.0.0
anthropic==0.18.1
gunicorn==21.2.0
Procfile:
web: gunicorn app:app
Railway config (railway.toml):
[build]
builder = "nixpacks"
[deploy]
healthcheckPath = "/health"
restartPolicyType = "on-failure"
Deploy:
# Install Railway CLI
npm install -g @railway/cli
# Login
railway login
# Initialize project
railway init
# Add environment variable
railway variables set ANTHROPIC_API_KEY=your_key_here
# Deploy
railway up
Get URL:
railway domain
Adding Database on Railway
Create Postgres:
railway add postgres
Connect in code:
import os
import psycopg2
def get_db_connection():
return psycopg2.connect(os.getenv("DATABASE_URL"))
# Usage
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute("SELECT * FROM conversations")
Option 3: Fly.io (Best for Global Edge)
Ideal for: Global deployments, low-latency requirements
Pros:
- Global edge network
- Near-instant cold starts
- Docker-native
- Cheapest option
Cons:
- More complex setup
- Less integrated tooling
Deployment Guide: Fly.io + Python
Project structure:
ai-agent/
├── agent.py
├── app.py
├── requirements.txt
├── Dockerfile
└── fly.toml
Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:8080", "--workers", "2", "--timeout", "120"]
Fly config (fly.toml):
app = "my-ai-agent"
primary_region = "sjc" # San Jose
[build]
dockerfile = "Dockerfile"
[env]
PORT = "8080"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
processes = ["app"]
[[services.ports]]
port = 80
handlers = ["http"]
force_https = true
[[services.ports]]
port = 443
handlers = ["tls", "http"]
[services.concurrency]
type = "requests"
hard_limit = 25
soft_limit = 20
[[services.http_checks]]
interval = "10s"
grace_period = "5s"
method = "get"
path = "/health"
protocol = "http"
timeout = "2s"
Deploy:
# Install Fly CLI
curl -L https://fly.io/install.sh | sh
# Login
flyctl auth login
# Launch app
flyctl launch
# Set secrets
flyctl secrets set ANTHROPIC_API_KEY=your_key_here
# Deploy
flyctl deploy
# Check status
flyctl status
# View logs
flyctl logs
Multi-Region Deployment
Deploy to multiple regions for low latency globally:
# Scale to multiple regions
flyctl regions add lax # Los Angeles
flyctl regions add iad # Virginia
flyctl regions add lhr # London
flyctl regions add sin # Singapore
# Scale instances
flyctl scale count 2 --region lax
flyctl scale count 2 --region iad
flyctl scale count 1 --region lhr
flyctl scale count 1 --region sin
Users automatically connect to nearest region.
Platform-Specific Optimizations
Vercel Optimization
Use Edge Functions:
export const runtime = "edge"; // Runs on edge, faster cold starts
Enable caching:
export const revalidate = 3600; // Cache for 1 hour
Use streaming:
return new StreamingTextResponse(stream); // Better UX
Railway Optimization
Use connection pooling:
from psycopg2 import pool
connection_pool = pool.SimpleConnectionPool(1, 20, os.getenv("DATABASE_URL"))
Add health checks:
@app.route("/health")
def health():
# Check dependencies
db_healthy = check_db()
api_healthy = check_anthropic_api()
if db_healthy and api_healthy:
return {"status": "healthy"}, 200
else:
return {"status": "unhealthy"}, 503
Fly.io Optimization
Use Redis for state:
flyctl redis create
import redis
r = redis.from_url(os.getenv("REDIS_URL"))
# Cache responses
r.setex(f"chat:{user_id}", 3600, response)
Optimize Dockerfile:
# Multi-stage build
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.11-slim
COPY --from=builder /root/.local /root/.local
COPY . /app
WORKDIR /app
CMD ["gunicorn", "app:app"]
Cost Comparison
For 100K requests/month:
Vercel:
- Hobby: Free (up to limits)
- Pro: $20/month + bandwidth
- Estimated: $20-40/month
Railway:
- Starter: $5/month
- Usage: ~$15-20/month
- Estimated: $20-25/month
Fly.io:
- Free tier: 3 VMs
- Paid: $1.94/VM/month
- Estimated: $10-15/month
Winner for cost: Fly.io
Monitoring & Observability
Vercel
Built-in analytics:
vercel logs
Add custom monitoring:
import { track } from "@vercel/analytics";
track("chat_request", {
user_id: userId,
model: "claude-3-5-sonnet",
});
Railway
Built-in logs and metrics in dashboard.
Add structured logging:
import logging
import json
logger = logging.getLogger(__name__)
logger.info(json.dumps({
"event": "chat_request",
"user_id": user_id,
"duration_ms": duration,
"tokens": tokens
}))
Fly.io
View logs:
flyctl logs
Add metrics:
flyctl extensions create prometheus
The Bottom Line
Vercel:
- Best for: Next.js apps with AI features
- Pros: Amazing DX, edge functions, easy deploys
- Cons: 60s timeout, higher cost
Railway:
- Best for: Python/Node backends, LangChain apps
- Pros: No timeout, built-in Postgres, simple pricing
- Cons: Not edge-optimized
Fly.io:
- Best for: Global apps, low latency, Docker users
- Pros: Cheapest, global edge, fast cold starts
- Cons: More complex setup
My recommendation:
- Prototype: Vercel (fastest to ship)
- Production: Fly.io (best performance/cost)
- Heavy backend: Railway (easiest Python deployment)
Getting Started
Week 1: Deploy on all three
- See which fits your workflow
- Measure cold starts
- Compare costs
Week 2: Choose platform
- Based on: language, timeout needs, budget
- Migrate to chosen platform
Week 3: Optimize
- Add caching
- Implement monitoring
- Scale as needed
Need help deploying your AI agent? We've deployed dozens of production systems.
Related reading:
- Vercel AI SDK: https://sdk.vercel.ai/docs
- Railway docs: https://docs.railway.app
- Fly.io docs: https://fly.io/docs
About the Author
DomAIn Labs Team
The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.