Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io
Agent Guides

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

DomAIn Labs Team
October 10, 2025
12 min read

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

You built an AI agent. Now you need to deploy it.

The old way: Provision servers, manage scaling, configure load balancers, worry about uptime.

The new way: Serverless platforms that handle infrastructure for you.

Let me show you how to deploy AI agents on three popular platforms: Vercel, Railway, and Fly.io.

Platform Comparison

FeatureVercelRailwayFly.io
Best forNext.js/React appsPython/Node backendsAny language, global edge
Cold start0-3s0-2s0-1s
Max execution60s (Hobby), 300s (Pro)UnlimitedUnlimited
Pricing$20/month$5/month$3/month (pay-as-you-go)
Auto-scalingYesYesYes
Custom domainsYes (easy)YesYes
DatabasesVia partnersBuilt-in PostgresBuilt-in Postgres
Edge functionsYesNoYes (global)
Docker supportLimitedYesYes

Option 1: Vercel (Best for Next.js)

Ideal for: Frontend-heavy apps with AI backend

Pros:

  • Amazing DX (developer experience)
  • Instant deploys from Git
  • Edge functions for low latency
  • Great Next.js integration

Cons:

  • 60s timeout on Hobby plan
  • Not great for long-running AI tasks
  • More expensive at scale

Deployment Guide: Vercel + Next.js

Project structure:

my-ai-app/
├── app/
│   ├── api/
│   │   └── chat/
│   │       └── route.ts
│   └── page.tsx
├── lib/
│   └── agent.ts
├── package.json
└── vercel.json

Install dependencies:

npm install @anthropic-ai/sdk ai

Create agent (lib/agent.ts):

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export async function chatWithAgent(message: string) {
  const response = await anthropic.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [{ role: "user", content: message }],
  });

  const textContent = response.content.find((block) => block.type === "text");
  return textContent?.text || "No response";
}

Create API route (app/api/chat/route.ts):

import { NextRequest, NextResponse } from "next/server";
import { chatWithAgent } from "@/lib/agent";

export async function POST(request: NextRequest) {
  try {
    const { message } = await request.json();

    if (!message) {
      return NextResponse.json(
        { error: "Message required" },
        { status: 400 }
      );
    }

    const response = await chatWithAgent(message);

    return NextResponse.json({ response });
  } catch (error: any) {
    console.error("Chat error:", error);
    return NextResponse.json(
      { error: error.message },
      { status: 500 }
    );
  }
}

export const runtime = "edge"; // Use edge runtime for faster cold starts

Configure Vercel (vercel.json):

{
  "functions": {
    "app/api/**": {
      "maxDuration": 60
    }
  }
}

Deploy:

# Install Vercel CLI
npm install -g vercel

# Deploy
vercel

# Add environment variables
vercel env add ANTHROPIC_API_KEY

Test:

curl -X POST https://your-app.vercel.app/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!"}'

Streaming Responses on Vercel

For better UX, stream responses:

// app/api/chat/route.ts
import { StreamingTextResponse, AnthropicStream } from "ai";

export async function POST(request: NextRequest) {
  const { message } = await request.json();

  const response = await anthropic.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [{ role: "user", content: message }],
    stream: true,
  });

  // Convert Anthropic stream to Vercel AI SDK stream
  const stream = AnthropicStream(response);

  return new StreamingTextResponse(stream);
}

Frontend:

// app/page.tsx
"use client";

import { useChat } from "ai/react";

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>
          {m.role}: {m.content}
        </div>
      ))}

      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

Option 2: Railway (Best for Python Backends)

Ideal for: Python agents, LangChain, complex backends

Pros:

  • No timeout limits
  • Built-in Postgres
  • Great for Python/Node
  • Simple pricing

Cons:

  • Not edge-optimized
  • Fewer integrations than Vercel

Deployment Guide: Railway + Python + Flask

Project structure:

ai-agent/
├── agent.py
├── app.py
├── requirements.txt
├── Procfile
└── railway.toml

Create agent (agent.py):

import os
from anthropic import Anthropic

class AIAgent:
    def __init__(self):
        self.anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

    def chat(self, message: str) -> str:
        response = self.anthropic.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            messages=[{"role": "user", "content": message}]
        )

        return response.content[0].text

Create Flask app (app.py):

import os
from flask import Flask, request, jsonify
from flask_cors import CORS
from agent import AIAgent

app = Flask(__name__)
CORS(app)

agent = AIAgent()

@app.route("/")
def home():
    return {"status": "AI Agent running on Railway"}

@app.route("/health")
def health():
    return {"status": "healthy"}

@app.route("/chat", methods=["POST"])
def chat():
    try:
        data = request.get_json()
        message = data.get("message")

        if not message:
            return jsonify({"error": "Message required"}), 400

        response = agent.chat(message)

        return jsonify({"response": response})

    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    port = int(os.getenv("PORT", 8080))
    app.run(host="0.0.0.0", port=port)

Dependencies (requirements.txt):

flask==3.0.2
flask-cors==4.0.0
anthropic==0.18.1
gunicorn==21.2.0

Procfile:

web: gunicorn app:app

Railway config (railway.toml):

[build]
builder = "nixpacks"

[deploy]
healthcheckPath = "/health"
restartPolicyType = "on-failure"

Deploy:

# Install Railway CLI
npm install -g @railway/cli

# Login
railway login

# Initialize project
railway init

# Add environment variable
railway variables set ANTHROPIC_API_KEY=your_key_here

# Deploy
railway up

Get URL:

railway domain

Adding Database on Railway

Create Postgres:

railway add postgres

Connect in code:

import os
import psycopg2

def get_db_connection():
    return psycopg2.connect(os.getenv("DATABASE_URL"))

# Usage
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute("SELECT * FROM conversations")

Option 3: Fly.io (Best for Global Edge)

Ideal for: Global deployments, low-latency requirements

Pros:

  • Global edge network
  • Near-instant cold starts
  • Docker-native
  • Cheapest option

Cons:

  • More complex setup
  • Less integrated tooling

Deployment Guide: Fly.io + Python

Project structure:

ai-agent/
├── agent.py
├── app.py
├── requirements.txt
├── Dockerfile
└── fly.toml

Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8080

CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:8080", "--workers", "2", "--timeout", "120"]

Fly config (fly.toml):

app = "my-ai-agent"
primary_region = "sjc" # San Jose

[build]
  dockerfile = "Dockerfile"

[env]
  PORT = "8080"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

[[services.ports]]
  port = 80
  handlers = ["http"]
  force_https = true

[[services.ports]]
  port = 443
  handlers = ["tls", "http"]

[services.concurrency]
  type = "requests"
  hard_limit = 25
  soft_limit = 20

[[services.http_checks]]
  interval = "10s"
  grace_period = "5s"
  method = "get"
  path = "/health"
  protocol = "http"
  timeout = "2s"

Deploy:

# Install Fly CLI
curl -L https://fly.io/install.sh | sh

# Login
flyctl auth login

# Launch app
flyctl launch

# Set secrets
flyctl secrets set ANTHROPIC_API_KEY=your_key_here

# Deploy
flyctl deploy

# Check status
flyctl status

# View logs
flyctl logs

Multi-Region Deployment

Deploy to multiple regions for low latency globally:

# Scale to multiple regions
flyctl regions add lax # Los Angeles
flyctl regions add iad # Virginia
flyctl regions add lhr # London
flyctl regions add sin # Singapore

# Scale instances
flyctl scale count 2 --region lax
flyctl scale count 2 --region iad
flyctl scale count 1 --region lhr
flyctl scale count 1 --region sin

Users automatically connect to nearest region.

Platform-Specific Optimizations

Vercel Optimization

Use Edge Functions:

export const runtime = "edge"; // Runs on edge, faster cold starts

Enable caching:

export const revalidate = 3600; // Cache for 1 hour

Use streaming:

return new StreamingTextResponse(stream); // Better UX

Railway Optimization

Use connection pooling:

from psycopg2 import pool

connection_pool = pool.SimpleConnectionPool(1, 20, os.getenv("DATABASE_URL"))

Add health checks:

@app.route("/health")
def health():
    # Check dependencies
    db_healthy = check_db()
    api_healthy = check_anthropic_api()

    if db_healthy and api_healthy:
        return {"status": "healthy"}, 200
    else:
        return {"status": "unhealthy"}, 503

Fly.io Optimization

Use Redis for state:

flyctl redis create
import redis

r = redis.from_url(os.getenv("REDIS_URL"))

# Cache responses
r.setex(f"chat:{user_id}", 3600, response)

Optimize Dockerfile:

# Multi-stage build
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
COPY --from=builder /root/.local /root/.local
COPY . /app
WORKDIR /app
CMD ["gunicorn", "app:app"]

Cost Comparison

For 100K requests/month:

Vercel:

  • Hobby: Free (up to limits)
  • Pro: $20/month + bandwidth
  • Estimated: $20-40/month

Railway:

  • Starter: $5/month
  • Usage: ~$15-20/month
  • Estimated: $20-25/month

Fly.io:

  • Free tier: 3 VMs
  • Paid: $1.94/VM/month
  • Estimated: $10-15/month

Winner for cost: Fly.io

Monitoring & Observability

Vercel

Built-in analytics:

vercel logs

Add custom monitoring:

import { track } from "@vercel/analytics";

track("chat_request", {
  user_id: userId,
  model: "claude-3-5-sonnet",
});

Railway

Built-in logs and metrics in dashboard.

Add structured logging:

import logging
import json

logger = logging.getLogger(__name__)

logger.info(json.dumps({
    "event": "chat_request",
    "user_id": user_id,
    "duration_ms": duration,
    "tokens": tokens
}))

Fly.io

View logs:

flyctl logs

Add metrics:

flyctl extensions create prometheus

The Bottom Line

Vercel:

  • Best for: Next.js apps with AI features
  • Pros: Amazing DX, edge functions, easy deploys
  • Cons: 60s timeout, higher cost

Railway:

  • Best for: Python/Node backends, LangChain apps
  • Pros: No timeout, built-in Postgres, simple pricing
  • Cons: Not edge-optimized

Fly.io:

  • Best for: Global apps, low latency, Docker users
  • Pros: Cheapest, global edge, fast cold starts
  • Cons: More complex setup

My recommendation:

  • Prototype: Vercel (fastest to ship)
  • Production: Fly.io (best performance/cost)
  • Heavy backend: Railway (easiest Python deployment)

Getting Started

Week 1: Deploy on all three

  • See which fits your workflow
  • Measure cold starts
  • Compare costs

Week 2: Choose platform

  • Based on: language, timeout needs, budget
  • Migrate to chosen platform

Week 3: Optimize

  • Add caching
  • Implement monitoring
  • Scale as needed

Need help deploying your AI agent? We've deployed dozens of production systems.

Get deployment help →


Related reading:

Tags:ServerlessDeploymentInfrastructureTutorial

About the Author

DomAIn Labs Team

The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.