How fast can you really build a website?

Our AI-powered process delivers professional websites in just 14 days, compared to the 3-6 months traditional agencies take. We achieve this through AI automation, 24/7 development capabilities, and streamlined processes.

What makes your AI solutions different?

We don't just add AI features - we rebuild your entire digital presence with AI at its core. This means faster delivery, lower costs, better performance, and continuous optimization. Our solutions are custom-built for your specific business needs.

How much does a website redesign cost?

Our website packages start at $2,000 for basic sites and go up to $20,000+ for enterprise solutions. This is 90% less than traditional agencies while delivering better results. All packages include AI optimization and ongoing support.

Do you work with small businesses?

Yes! We work with businesses of all sizes. Our Basic package at $2,000 is perfect for small businesses needing a professional web presence. We also offer flexible payment plans to make AI transformation accessible.

What AI chatbot features do you offer?

Our AI chatbots handle customer service, appointment scheduling, lead qualification, and sales support. They integrate with your existing systems and learn from interactions to improve over time. Plans start at $297/month.

Can you help with SEO and Google Ads?

Absolutely! Our AI-powered SEO starts at $497/month and includes keyword research, content strategy, and continuous optimization. Google Ads management starts at $997/month plus ad spend, with AI optimizing your campaigns 24/7.

Do you offer custom enterprise solutions?

Yes, we create custom AI solutions for enterprises including workflow automation, document processing, predictive analytics, and full digital transformation. Contact us for a custom consultation and quote.

What happens after my website launches?

We provide ongoing support, hosting, and AI-powered optimization. Our AI continuously monitors your site's performance, suggests improvements, and can automatically implement updates to improve conversion rates.

How do I get started?

Simply visit our contact page or click any 'Get Started' button on our site. We'll schedule a free consultation to understand your needs and recommend the best solution. Most projects start within 48 hours of approval.

What if I'm not satisfied with the results?

We offer a 100% satisfaction guarantee. We'll work with you until you're completely happy with the results. Our AI-powered approach allows us to make rapid iterations based on your feedback.

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

Back to all articles

Agent Guides

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

DomAIn Labs Team

October 10, 2025

12 min read

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

You built an AI agent. Now you need to deploy it.

The old way: Provision servers, manage scaling, configure load balancers, worry about uptime.

The new way: Serverless platforms that handle infrastructure for you.

Let me show you how to deploy AI agents on three popular platforms: Vercel, Railway, and Fly.io.

Platform Comparison

Feature	Vercel	Railway	Fly.io
Best for	Next.js/React apps	Python/Node backends	Any language, global edge
Cold start	0-3s	0-2s	0-1s
Max execution	60s (Hobby), 300s (Pro)	Unlimited	Unlimited
Pricing	$20/month	$5/month	$3/month (pay-as-you-go)
Auto-scaling	Yes	Yes	Yes
Custom domains	Yes (easy)	Yes	Yes
Databases	Via partners	Built-in Postgres	Built-in Postgres
Edge functions	Yes	No	Yes (global)
Docker support	Limited	Yes	Yes

Option 1: Vercel (Best for Next.js)

Ideal for: Frontend-heavy apps with AI backend

Pros:

Amazing DX (developer experience)
Instant deploys from Git
Edge functions for low latency
Great Next.js integration

Cons:

60s timeout on Hobby plan
Not great for long-running AI tasks
More expensive at scale

Deployment Guide: Vercel + Next.js

Project structure:

my-ai-app/
├── app/
│   ├── api/
│   │   └── chat/
│   │       └── route.ts
│   └── page.tsx
├── lib/
│   └── agent.ts
├── package.json
└── vercel.json

Install dependencies:

npm install @anthropic-ai/sdk ai

Create agent (lib/agent.ts):

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export async function chatWithAgent(message: string) {
  const response = await anthropic.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [{ role: "user", content: message }],
  });

  const textContent = response.content.find((block) => block.type === "text");
  return textContent?.text || "No response";
}

Create API route (app/api/chat/route.ts):

import { NextRequest, NextResponse } from "next/server";
import { chatWithAgent } from "@/lib/agent";

export async function POST(request: NextRequest) {
  try {
    const { message } = await request.json();

    if (!message) {
      return NextResponse.json(
        { error: "Message required" },
        { status: 400 }
      );
    }

    const response = await chatWithAgent(message);

    return NextResponse.json({ response });
  } catch (error: any) {
    console.error("Chat error:", error);
    return NextResponse.json(
      { error: error.message },
      { status: 500 }
    );
  }
}

export const runtime = "edge"; // Use edge runtime for faster cold starts

Configure Vercel (vercel.json):

{
  "functions": {
    "app/api/**": {
      "maxDuration": 60
    }
  }
}

Deploy:

# Install Vercel CLI
npm install -g vercel

# Deploy
vercel

# Add environment variables
vercel env add ANTHROPIC_API_KEY

Test:

curl -X POST https://your-app.vercel.app/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!"}'

Streaming Responses on Vercel

For better UX, stream responses:

// app/api/chat/route.ts
import { StreamingTextResponse, AnthropicStream } from "ai";

export async function POST(request: NextRequest) {
  const { message } = await request.json();

  const response = await anthropic.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [{ role: "user", content: message }],
    stream: true,
  });

  // Convert Anthropic stream to Vercel AI SDK stream
  const stream = AnthropicStream(response);

  return new StreamingTextResponse(stream);
}

Frontend:

// app/page.tsx
"use client";

import { useChat } from "ai/react";

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>
          {m.role}: {m.content}
        </div>
      ))}

      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

Option 2: Railway (Best for Python Backends)

Ideal for: Python agents, LangChain, complex backends

Pros:

No timeout limits
Built-in Postgres
Great for Python/Node
Simple pricing

Cons:

Not edge-optimized
Fewer integrations than Vercel

Deployment Guide: Railway + Python + Flask

Project structure:

ai-agent/
├── agent.py
├── app.py
├── requirements.txt
├── Procfile
└── railway.toml

Create agent (agent.py):

import os
from anthropic import Anthropic

class AIAgent:
    def __init__(self):
        self.anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

    def chat(self, message: str) -> str:
        response = self.anthropic.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            messages=[{"role": "user", "content": message}]
        )

        return response.content[0].text

Create Flask app (app.py):

import os
from flask import Flask, request, jsonify
from flask_cors import CORS
from agent import AIAgent

app = Flask(__name__)
CORS(app)

agent = AIAgent()

@app.route("/")
def home():
    return {"status": "AI Agent running on Railway"}

@app.route("/health")
def health():
    return {"status": "healthy"}

@app.route("/chat", methods=["POST"])
def chat():
    try:
        data = request.get_json()
        message = data.get("message")

        if not message:
            return jsonify({"error": "Message required"}), 400

        response = agent.chat(message)

        return jsonify({"response": response})

    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    port = int(os.getenv("PORT", 8080))
    app.run(host="0.0.0.0", port=port)

Dependencies (requirements.txt):

flask==3.0.2
flask-cors==4.0.0
anthropic==0.18.1
gunicorn==21.2.0

Procfile:

web: gunicorn app:app

Railway config (railway.toml):

[build]
builder = "nixpacks"

[deploy]
healthcheckPath = "/health"
restartPolicyType = "on-failure"

Deploy:

# Install Railway CLI
npm install -g @railway/cli

# Login
railway login

# Initialize project
railway init

# Add environment variable
railway variables set ANTHROPIC_API_KEY=your_key_here

# Deploy
railway up

Get URL:

railway domain

Adding Database on Railway

Create Postgres:

railway add postgres

Connect in code:

import os
import psycopg2

def get_db_connection():
    return psycopg2.connect(os.getenv("DATABASE_URL"))

# Usage
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute("SELECT * FROM conversations")

Option 3: Fly.io (Best for Global Edge)

Ideal for: Global deployments, low-latency requirements

Pros:

Global edge network
Near-instant cold starts
Docker-native
Cheapest option

Cons:

More complex setup
Less integrated tooling

Deployment Guide: Fly.io + Python

Project structure:

ai-agent/
├── agent.py
├── app.py
├── requirements.txt
├── Dockerfile
└── fly.toml

Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8080

CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:8080", "--workers", "2", "--timeout", "120"]

Fly config (fly.toml):

app = "my-ai-agent"
primary_region = "sjc" # San Jose

[build]
  dockerfile = "Dockerfile"

[env]
  PORT = "8080"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

[[services.ports]]
  port = 80
  handlers = ["http"]
  force_https = true

[[services.ports]]
  port = 443
  handlers = ["tls", "http"]

[services.concurrency]
  type = "requests"
  hard_limit = 25
  soft_limit = 20

[[services.http_checks]]
  interval = "10s"
  grace_period = "5s"
  method = "get"
  path = "/health"
  protocol = "http"
  timeout = "2s"

Deploy:

# Install Fly CLI
curl -L https://fly.io/install.sh | sh

# Login
flyctl auth login

# Launch app
flyctl launch

# Set secrets
flyctl secrets set ANTHROPIC_API_KEY=your_key_here

# Deploy
flyctl deploy

# Check status
flyctl status

# View logs
flyctl logs

Multi-Region Deployment

Deploy to multiple regions for low latency globally:

# Scale to multiple regions
flyctl regions add lax # Los Angeles
flyctl regions add iad # Virginia
flyctl regions add lhr # London
flyctl regions add sin # Singapore

# Scale instances
flyctl scale count 2 --region lax
flyctl scale count 2 --region iad
flyctl scale count 1 --region lhr
flyctl scale count 1 --region sin

Users automatically connect to nearest region.

Platform-Specific Optimizations

Vercel Optimization

Use Edge Functions:

export const runtime = "edge"; // Runs on edge, faster cold starts

Enable caching:

export const revalidate = 3600; // Cache for 1 hour

Use streaming:

return new StreamingTextResponse(stream); // Better UX

Railway Optimization

Use connection pooling:

from psycopg2 import pool

connection_pool = pool.SimpleConnectionPool(1, 20, os.getenv("DATABASE_URL"))

Add health checks:

@app.route("/health")
def health():
    # Check dependencies
    db_healthy = check_db()
    api_healthy = check_anthropic_api()

    if db_healthy and api_healthy:
        return {"status": "healthy"}, 200
    else:
        return {"status": "unhealthy"}, 503

Fly.io Optimization

Use Redis for state:

flyctl redis create

import redis

r = redis.from_url(os.getenv("REDIS_URL"))

# Cache responses
r.setex(f"chat:{user_id}", 3600, response)

Optimize Dockerfile:

# Multi-stage build
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
COPY --from=builder /root/.local /root/.local
COPY . /app
WORKDIR /app
CMD ["gunicorn", "app:app"]

Cost Comparison

For 100K requests/month:

Vercel:

Hobby: Free (up to limits)
Pro: $20/month + bandwidth
Estimated: $20-40/month

Railway:

Starter: $5/month
Usage: ~$15-20/month
Estimated: $20-25/month

Fly.io:

Free tier: 3 VMs
Paid: $1.94/VM/month
Estimated: $10-15/month

Winner for cost: Fly.io

Monitoring & Observability

Vercel

Built-in analytics:

vercel logs

Add custom monitoring:

import { track } from "@vercel/analytics";

track("chat_request", {
  user_id: userId,
  model: "claude-3-5-sonnet",
});

Railway

Built-in logs and metrics in dashboard.

Add structured logging:

import logging
import json

logger = logging.getLogger(__name__)

logger.info(json.dumps({
    "event": "chat_request",
    "user_id": user_id,
    "duration_ms": duration,
    "tokens": tokens
}))

Fly.io

View logs:

flyctl logs

Add metrics:

flyctl extensions create prometheus

The Bottom Line

Vercel:

Best for: Next.js apps with AI features
Pros: Amazing DX, edge functions, easy deploys
Cons: 60s timeout, higher cost

Railway:

Best for: Python/Node backends, LangChain apps
Pros: No timeout, built-in Postgres, simple pricing
Cons: Not edge-optimized

Fly.io:

Best for: Global apps, low latency, Docker users
Pros: Cheapest, global edge, fast cold starts
Cons: More complex setup

My recommendation:

Prototype: Vercel (fastest to ship)
Production: Fly.io (best performance/cost)
Heavy backend: Railway (easiest Python deployment)

Getting Started

Week 1: Deploy on all three

See which fits your workflow
Measure cold starts
Compare costs

Week 2: Choose platform

Based on: language, timeout needs, budget
Migrate to chosen platform

Week 3: Optimize

Add caching
Implement monitoring
Scale as needed

Need help deploying your AI agent? We've deployed dozens of production systems.

Get deployment help →

Related reading:

Vercel AI SDK: https://sdk.vercel.ai/docs
Railway docs: https://docs.railway.app
Fly.io docs: https://fly.io/docs

Tags:ServerlessDeploymentInfrastructureTutorial

About the Author

DomAIn Labs Team

The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

Serverless Agent Inference Pipelines: How to Run AI on Vercel, Railway, or Fly.io

Platform Comparison

Option 1: Vercel (Best for Next.js)

Deployment Guide: Vercel + Next.js

Streaming Responses on Vercel

Option 2: Railway (Best for Python Backends)

Deployment Guide: Railway + Python + Flask

Adding Database on Railway

Option 3: Fly.io (Best for Global Edge)

Deployment Guide: Fly.io + Python

Multi-Region Deployment

Platform-Specific Optimizations

Vercel Optimization

Railway Optimization

Fly.io Optimization

Cost Comparison

Monitoring & Observability

Vercel

Railway

Fly.io

The Bottom Line

Getting Started

About the Author

Related Articles

MCP Isn't Dead, But Bloated Agentic Workflows Are

Building a Claude Agent on Railway + Supabase in 20 Minutes

When LLMs Hallucinate Your Workflow: Debugging Agent Chains Gone Rogue