How fast can you really build a website?

Our AI-powered process delivers professional websites in just 14 days, compared to the 3-6 months traditional agencies take. We achieve this through AI automation, 24/7 development capabilities, and streamlined processes.

What makes your AI solutions different?

We don't just add AI features - we rebuild your entire digital presence with AI at its core. This means faster delivery, lower costs, better performance, and continuous optimization. Our solutions are custom-built for your specific business needs.

How much does a website redesign cost?

Our website packages start at $2,000 for basic sites and go up to $20,000+ for enterprise solutions. This is 90% less than traditional agencies while delivering better results. All packages include AI optimization and ongoing support.

Do you work with small businesses?

Yes! We work with businesses of all sizes. Our Basic package at $2,000 is perfect for small businesses needing a professional web presence. We also offer flexible payment plans to make AI transformation accessible.

What AI chatbot features do you offer?

Our AI chatbots handle customer service, appointment scheduling, lead qualification, and sales support. They integrate with your existing systems and learn from interactions to improve over time. Plans start at $297/month.

Can you help with SEO and Google Ads?

Absolutely! Our AI-powered SEO starts at $497/month and includes keyword research, content strategy, and continuous optimization. Google Ads management starts at $997/month plus ad spend, with AI optimizing your campaigns 24/7.

Do you offer custom enterprise solutions?

Yes, we create custom AI solutions for enterprises including workflow automation, document processing, predictive analytics, and full digital transformation. Contact us for a custom consultation and quote.

What happens after my website launches?

We provide ongoing support, hosting, and AI-powered optimization. Our AI continuously monitors your site's performance, suggests improvements, and can automatically implement updates to improve conversion rates.

How do I get started?

Simply visit our contact page or click any 'Get Started' button on our site. We'll schedule a free consultation to understand your needs and recommend the best solution. Most projects start within 48 hours of approval.

What if I'm not satisfied with the results?

We offer a 100% satisfaction guarantee. We'll work with you until you're completely happy with the results. Our AI-powered approach allows us to make rapid iterations based on your feedback.

Towering mountain peaks piercing through clouds

Back to all articles

Agent Guides

Building Multi-Step Workflow Agents That Don't Break

DomAIn Labs Team

January 22, 2025

14 min read

Building Multi-Step Workflow Agents That Don't Break

A simple AI agent answers questions. A powerful AI agent executes workflows.

Imagine an agent that doesn't just respond to "I need to return this order"—it checks the order status, validates return eligibility, generates a return label, emails it to the customer, updates inventory, and schedules the refund. All automatically.

That's a multi-step workflow agent. And building one that actually works reliably requires proper architecture.

This guide shows you how to build workflow agents that handle complexity, recover from errors, and don't leave customers in broken states.

What Makes Workflow Agents Different

Simple Agent (Single Step)

// One action, done
async function simpleAgent(input: string) {
  const response = await llm.generate(input)
  return response
}

Use case: Answer FAQ, provide information, classify data

Workflow Agent (Multi-Step)

// Multiple coordinated actions with state
async function workflowAgent(input: string) {
  let state = initializeState()

  // Step 1: Understand intent
  state = await planWorkflow(input, state)

  // Step 2-N: Execute each step
  for (const step of state.plan) {
    state = await executeStep(step, state)

    if (state.error) {
      state = await handleError(state)
    }

    if (state.complete) break
  }

  return state.result
}

Use case: Process returns, onboard customers, fulfill orders, escalate support tickets

Core Architecture Pattern

Here's the reliable pattern for workflow agents:

interface WorkflowState {
  // What we're trying to accomplish
  goal: string

  // Current execution plan
  plan: WorkflowStep[]

  // What step are we on?
  currentStep: number

  // Data accumulated during workflow
  context: Record<string, any>

  // Errors encountered
  errors: Error[]

  // Final result
  result?: any

  // Status
  status: 'planning' | 'executing' | 'completed' | 'failed'
}

interface WorkflowStep {
  name: string
  action: (state: WorkflowState) => Promise<WorkflowState>
  requiresHuman?: boolean
  retryable?: boolean
  timeout?: number
}

Why this structure?

State: Everything needed to resume if interrupted
Plan: Clear sequence of actions (debuggable, modifiable)
Context: Data flows between steps
Errors: Tracked for recovery and reporting
Status: Know exactly where we are

Real Example: Order Return Workflow

Let's build a complete return processing agent.

Step 1: Define the Workflow

const returnOrderWorkflow: WorkflowStep[] = [
  {
    name: 'validate_order',
    action: validateOrder,
    retryable: false,
    timeout: 5000
  },
  {
    name: 'check_return_eligibility',
    action: checkReturnEligibility,
    retryable: true,
    timeout: 5000
  },
  {
    name: 'generate_return_label',
    action: generateReturnLabel,
    retryable: true,
    timeout: 10000
  },
  {
    name: 'send_confirmation_email',
    action: sendConfirmationEmail,
    retryable: true,
    timeout: 5000
  },
  {
    name: 'update_inventory',
    action: updateInventory,
    retryable: true,
    timeout: 5000
  },
  {
    name: 'schedule_refund',
    action: scheduleRefund,
    retryable: true,
    timeout: 5000
  }
]

Step 2: Implement Each Step

async function validateOrder(state: WorkflowState): Promise<WorkflowState> {
  try {
    const { orderNumber } = state.context

    // Query order database
    const order = await db.orders.findUnique({
      where: { orderNumber }
    })

    if (!order) {
      throw new Error(`Order ${orderNumber} not found`)
    }

    // Add order data to context for next steps
    return {
      ...state,
      context: {
        ...state.context,
        order: order,
        customerId: order.customerId,
        orderDate: order.createdAt
      },
      currentStep: state.currentStep + 1
    }

  } catch (error) {
    return {
      ...state,
      errors: [...state.errors, error],
      status: 'failed'
    }
  }
}

async function checkReturnEligibility(state: WorkflowState): Promise<WorkflowState> {
  try {
    const { order, returnReason } = state.context

    // Calculate days since purchase
    const daysSincePurchase = daysBetween(order.createdAt, new Date())

    // Check return window (60 days)
    if (daysSincePurchase > 60) {
      // Escalate to human for approval
      return {
        ...state,
        status: 'requires_human',
        context: {
          ...state.context,
          escalationReason: 'Return requested outside 60-day window',
          escalationMessage: `Order from ${order.createdAt} is ${daysSincePurchase} days old. Requires manager approval.`
        }
      }
    }

    // Check if items are returnable
    const nonReturnableItems = order.items.filter(
      item => item.category === 'final_sale'
    )

    if (nonReturnableItems.length > 0) {
      throw new Error(
        `Order contains non-returnable items: ${nonReturnableItems.map(i => i.name).join(', ')}`
      )
    }

    // Eligible - continue
    return {
      ...state,
      context: {
        ...state.context,
        eligible: true,
        returnWindow: 60 - daysSincePurchase  // Days remaining
      },
      currentStep: state.currentStep + 1
    }

  } catch (error) {
    return {
      ...state,
      errors: [...state.errors, error],
      status: 'failed'
    }
  }
}

async function generateReturnLabel(state: WorkflowState): Promise<WorkflowState> {
  try {
    const { order } = state.context

    // Call shipping API
    const label = await shippingAPI.createReturnLabel({
      orderId: order.id,
      fromAddress: order.shippingAddress,
      toAddress: WAREHOUSE_ADDRESS,
      weight: calculateWeight(order.items),
      serviceLevel: 'ground'
    })

    return {
      ...state,
      context: {
        ...state.context,
        returnLabel: label,
        trackingNumber: label.trackingNumber,
        labelUrl: label.pdfUrl
      },
      currentStep: state.currentStep + 1
    }

  } catch (error) {
    // Shipping API failure - retry logic handled by executor
    return {
      ...state,
      errors: [...state.errors, error],
      status: 'failed'
    }
  }
}

async function sendConfirmationEmail(state: WorkflowState): Promise<WorkflowState> {
  try {
    const { order, returnLabel, customerId } = state.context

    const customer = await db.customers.findUnique({
      where: { id: customerId }
    })

    await emailService.send({
      to: customer.email,
      template: 'return_confirmation',
      data: {
        customerName: customer.name,
        orderNumber: order.orderNumber,
        returnLabelUrl: returnLabel.pdfUrl,
        trackingNumber: returnLabel.trackingNumber,
        estimatedRefundDate: addDays(new Date(), 7)
      }
    })

    return {
      ...state,
      context: {
        ...state.context,
        emailSent: true,
        emailSentAt: new Date()
      },
      currentStep: state.currentStep + 1
    }

  } catch (error) {
    // Email failure shouldn't block workflow
    // Log error but continue
    console.error('Email send failed:', error)

    return {
      ...state,
      errors: [...state.errors, error],
      context: {
        ...state.context,
        emailSent: false,
        emailError: error.message
      },
      currentStep: state.currentStep + 1  // Continue anyway
    }
  }
}

async function updateInventory(state: WorkflowState): Promise<WorkflowState> {
  try {
    const { order } = state.context

    // Mark items as returning
    await db.inventory.updateMany({
      where: {
        itemId: { in: order.items.map(item => item.id) }
      },
      data: {
        status: 'returning',
        expectedReturnDate: addDays(new Date(), 7)
      }
    })

    return {
      ...state,
      context: {
        ...state.context,
        inventoryUpdated: true
      },
      currentStep: state.currentStep + 1
    }

  } catch (error) {
    return {
      ...state,
      errors: [...state.errors, error],
      status: 'failed'
    }
  }
}

async function scheduleRefund(state: WorkflowState): Promise<WorkflowState> {
  try {
    const { order } = state.context

    // Schedule refund for 7 days from now (when return received)
    const refund = await paymentProcessor.scheduleRefund({
      orderId: order.id,
      amount: order.total,
      scheduledFor: addDays(new Date(), 7),
      reason: 'return_requested'
    })

    return {
      ...state,
      context: {
        ...state.context,
        refundScheduled: true,
        refundId: refund.id,
        refundAmount: refund.amount,
        refundDate: refund.scheduledFor
      },
      currentStep: state.currentStep + 1,
      status: 'completed',
      result: {
        success: true,
        message: 'Return processed successfully',
        trackingNumber: state.context.trackingNumber,
        refundAmount: refund.amount,
        refundDate: refund.scheduledFor
      }
    }

  } catch (error) {
    return {
      ...state,
      errors: [...state.errors, error],
      status: 'failed'
    }
  }
}

Step 3: Build the Workflow Executor

This is the engine that runs workflows with error handling:

class WorkflowExecutor {
  async execute(
    workflow: WorkflowStep[],
    initialContext: Record<string, any>
  ): Promise<WorkflowState> {

    let state: WorkflowState = {
      goal: 'process_return',
      plan: workflow,
      currentStep: 0,
      context: initialContext,
      errors: [],
      status: 'executing'
    }

    // Save initial state for recovery
    await this.saveState(state)

    // Execute each step
    for (let i = 0; i < workflow.length; i++) {
      const step = workflow[i]

      console.log(`Executing step ${i + 1}/${workflow.length}: ${step.name}`)

      // Execute with retry logic
      state = await this.executeWithRetry(step, state)

      // Save state after each step (for recovery)
      await this.saveState(state)

      // Check if workflow should stop
      if (state.status === 'failed') {
        await this.handleFailure(state)
        break
      }

      if (state.status === 'requires_human') {
        await this.escalateToHuman(state)
        break
      }

      if (state.status === 'completed') {
        await this.handleSuccess(state)
        break
      }
    }

    return state
  }

  private async executeWithRetry(
    step: WorkflowStep,
    state: WorkflowState
  ): Promise<WorkflowState> {

    const maxRetries = step.retryable ? 3 : 1
    let lastError: Error | null = null

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        // Execute with timeout
        const result = await this.executeWithTimeout(
          step.action(state),
          step.timeout || 30000
        )

        // Success - return result
        return result

      } catch (error) {
        lastError = error
        console.error(
          `Step ${step.name} failed (attempt ${attempt}/${maxRetries}):`,
          error
        )

        // If retryable and not last attempt, wait and retry
        if (step.retryable && attempt < maxRetries) {
          const backoff = Math.pow(2, attempt) * 1000  // Exponential backoff
          await this.sleep(backoff)
          continue
        }
      }
    }

    // All retries failed
    return {
      ...state,
      errors: [...state.errors, lastError!],
      status: 'failed'
    }
  }

  private async executeWithTimeout<T>(
    promise: Promise<T>,
    timeout: number
  ): Promise<T> {
    return Promise.race([
      promise,
      new Promise<T>((_, reject) =>
        setTimeout(() => reject(new Error('Step timeout')), timeout)
      )
    ])
  }

  private async saveState(state: WorkflowState): Promise<void> {
    // Persist state to database for recovery
    await db.workflowStates.upsert({
      where: { id: state.context.workflowId },
      create: {
        id: state.context.workflowId,
        state: JSON.stringify(state),
        updatedAt: new Date()
      },
      update: {
        state: JSON.stringify(state),
        updatedAt: new Date()
      }
    })
  }

  private async handleFailure(state: WorkflowState): Promise<void> {
    // Log failure
    console.error('Workflow failed:', state.errors)

    // Notify monitoring
    await monitoring.alert({
      type: 'workflow_failure',
      workflowId: state.context.workflowId,
      goal: state.goal,
      failedStep: state.plan[state.currentStep]?.name,
      errors: state.errors.map(e => e.message)
    })

    // Notify customer if applicable
    if (state.context.customerId) {
      await this.notifyCustomerOfFailure(state)
    }
  }

  private async escalateToHuman(state: WorkflowState): Promise<void> {
    // Create support ticket for human review
    await db.tickets.create({
      data: {
        type: 'workflow_escalation',
        priority: 'high',
        workflowId: state.context.workflowId,
        reason: state.context.escalationReason,
        context: JSON.stringify(state.context),
        assignedTo: 'support_team'
      }
    })

    // Notify team
    await notificationService.send({
      channel: 'slack',
      message: `Workflow escalated: ${state.context.escalationMessage}`,
      data: state.context
    })
  }

  private async handleSuccess(state: WorkflowState): Promise<void> {
    console.log('Workflow completed successfully:', state.result)

    // Log success metric
    await analytics.track('workflow_success', {
      goal: state.goal,
      duration: Date.now() - state.context.startTime,
      stepsCompleted: state.currentStep + 1
    })
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms))
  }
}

Step 4: Usage

// Initialize executor
const executor = new WorkflowExecutor()

// Process a return
const result = await executor.execute(
  returnOrderWorkflow,
  {
    workflowId: generateId(),
    orderNumber: 'ORD-12345',
    returnReason: 'Changed mind',
    startTime: Date.now()
  }
)

if (result.status === 'completed') {
  console.log('Return processed:', result.result)
} else if (result.status === 'failed') {
  console.error('Return failed:', result.errors)
} else if (result.status === 'requires_human') {
  console.log('Escalated to human:', result.context.escalationReason)
}

Error Handling Strategies

1. Graceful Degradation

Not all failures should stop the workflow:

// Critical step - failure stops workflow
async function processPayment(state: WorkflowState) {
  try {
    const result = await paymentAPI.charge(state.context.amount)
    return { ...state, context: { ...state.context, paymentId: result.id } }
  } catch (error) {
    return { ...state, status: 'failed', errors: [...state.errors, error] }
  }
}

// Non-critical step - failure logged but workflow continues
async function sendReceipt(state: WorkflowState) {
  try {
    await emailService.send(state.context.receipt)
    return { ...state, context: { ...state.context, receiptSent: true } }
  } catch (error) {
    console.error('Receipt send failed:', error)
    // Continue anyway - we can resend later
    return {
      ...state,
      errors: [...state.errors, error],
      context: { ...state.context, receiptSent: false }
    }
  }
}

2. Compensating Actions (Rollback)

When a late step fails, undo earlier steps:

const workflowWithCompensation: WorkflowStep[] = [
  {
    name: 'reserve_inventory',
    action: reserveInventory,
    compensation: releaseInventory  // Undo if workflow fails
  },
  {
    name: 'charge_customer',
    action: chargeCustomer,
    compensation: refundCustomer
  },
  {
    name: 'ship_order',
    action: shipOrder,
    compensation: cancelShipment
  }
]

async function executeWithCompensation(workflow: WorkflowStep[], context: any) {
  const completedSteps: WorkflowStep[] = []

  for (const step of workflow) {
    try {
      await step.action(context)
      completedSteps.push(step)
    } catch (error) {
      // Failure - rollback completed steps in reverse order
      console.error(`Step ${step.name} failed. Rolling back...`)

      for (const completedStep of completedSteps.reverse()) {
        if (completedStep.compensation) {
          await completedStep.compensation(context)
        }
      }

      throw error
    }
  }
}

3. Human-in-the-Loop

Some decisions require human judgment:

async function requiresHumanReview(state: WorkflowState): Promise<WorkflowState> {
  // Check if automated decision is confident
  const confidence = await mlModel.predictConfidence(state.context.data)

  if (confidence < 0.8) {
    // Low confidence - escalate
    return {
      ...state,
      status: 'requires_human',
      context: {
        ...state.context,
        reviewReason: `Model confidence ${confidence} below threshold`,
        reviewData: state.context.data
      }
    }
  }

  // High confidence - proceed automatically
  return {
    ...state,
    currentStep: state.currentStep + 1
  }
}

State Management Best Practices

1. Idempotency

Steps should be safe to retry:

// BAD - not idempotent
async function sendEmail(state: WorkflowState) {
  await emailService.send(state.context.email)  // Sends duplicate if retried
}

// GOOD - idempotent
async function sendEmail(state: WorkflowState) {
  // Check if already sent
  if (state.context.emailSent) {
    return state  // Skip if already done
  }

  await emailService.send(state.context.email)

  return {
    ...state,
    context: {
      ...state.context,
      emailSent: true,
      emailSentAt: new Date()
    }
  }
}

2. State Persistence

Save state after every step for recovery:

// If workflow crashes mid-execution, we can resume
async function resumeWorkflow(workflowId: string) {
  // Load saved state from database
  const savedState = await db.workflowStates.findUnique({
    where: { id: workflowId }
  })

  const state: WorkflowState = JSON.parse(savedState.state)

  // Resume from where we left off
  const executor = new WorkflowExecutor()
  return executor.execute(
    state.plan.slice(state.currentStep),  // Remaining steps
    state.context
  )
}

3. Timeout Management

Prevent indefinite waiting:

const step: WorkflowStep = {
  name: 'call_external_api',
  action: callExternalAPI,
  timeout: 10000,  // 10 seconds max
  retryable: true
}

// If API call takes > 10s, timeout and retry

Testing Workflow Agents

Unit Test Individual Steps

describe('validateOrder', () => {
  it('should add order to context if found', async () => {
    const state = {
      context: { orderNumber: 'ORD-123' },
      // ... other state
    }

    const result = await validateOrder(state)

    expect(result.context.order).toBeDefined()
    expect(result.currentStep).toBe(state.currentStep + 1)
  })

  it('should fail if order not found', async () => {
    const state = {
      context: { orderNumber: 'INVALID' },
      // ... other state
    }

    const result = await validateOrder(state)

    expect(result.status).toBe('failed')
    expect(result.errors).toHaveLength(1)
  })
})

Integration Test Full Workflows

describe('returnOrderWorkflow', () => {
  it('should complete successfully for valid return', async () => {
    const executor = new WorkflowExecutor()

    const result = await executor.execute(returnOrderWorkflow, {
      workflowId: 'test-123',
      orderNumber: 'ORD-VALID',
      returnReason: 'Defective'
    })

    expect(result.status).toBe('completed')
    expect(result.result.success).toBe(true)
    expect(result.result.trackingNumber).toBeDefined()
  })

  it('should escalate returns outside 60-day window', async () => {
    // Order from 90 days ago
    await db.orders.create({
      data: {
        orderNumber: 'ORD-OLD',
        createdAt: subDays(new Date(), 90)
      }
    })

    const result = await executor.execute(returnOrderWorkflow, {
      workflowId: 'test-124',
      orderNumber: 'ORD-OLD',
      returnReason: 'Changed mind'
    })

    expect(result.status).toBe('requires_human')
    expect(result.context.escalationReason).toContain('outside 60-day window')
  })
})

Monitoring & Observability

Track workflow performance:

// Log every step execution
await analytics.track('workflow_step_executed', {
  workflowId: state.context.workflowId,
  step: step.name,
  duration: executionTime,
  success: !state.errors.length
})

// Monitor common failure points
if (state.errors.length > 0) {
  await monitoring.increment(`workflow.step.${step.name}.errors`)
}

// Track end-to-end metrics
await analytics.track('workflow_completed', {
  goal: state.goal,
  totalSteps: state.plan.length,
  completedSteps: state.currentStep + 1,
  duration: Date.now() - state.context.startTime,
  status: state.status
})

The Bottom Line

Building reliable workflow agents requires:

Architecture:

Clear state management
Well-defined steps
Error handling at every level
State persistence for recovery

Error Handling:

Retry logic for transient failures
Graceful degradation for non-critical steps
Compensating actions for rollback
Human escalation for edge cases

Testing:

Unit tests for individual steps
Integration tests for full workflows
Chaos testing for error scenarios

Monitoring:

Track execution at step level
Alert on failures
Measure performance
Log for debugging

Investment: 3-6 weeks to build a robust workflow agent system

Returns: Automate complex business processes end-to-end, 80-95% success rate

Next Steps

Map your workflow: Document the steps needed end-to-end
Identify decision points: Where might errors occur? Where's human judgment needed?
Design error handling: How should each failure type be handled?
Build incrementally: Start with happy path, add error handling, then edge cases
Test thoroughly: Simulate failures, test recovery, validate idempotency

Need help building a workflow agent? Schedule a consultation to discuss your specific use case, or check out our case studies to see workflow agents in action.

Remember: The first version doesn't need to handle every edge case. Start simple, deploy, learn from real usage, and iterate.

Tags:workflowsstate managementerror handlingarchitecturereliability

About the Author

DomAIn Labs Team

The DomAIn Labs team consists of AI engineers, strategists, and educators passionate about demystifying AI for small businesses.

Building Multi-Step Workflow Agents That Don't Break

Building Multi-Step Workflow Agents That Don't Break

What Makes Workflow Agents Different

Simple Agent (Single Step)

Workflow Agent (Multi-Step)

Core Architecture Pattern

Real Example: Order Return Workflow

Step 1: Define the Workflow

Step 2: Implement Each Step

Step 3: Build the Workflow Executor

Step 4: Usage

Error Handling Strategies

1. Graceful Degradation

2. Compensating Actions (Rollback)

3. Human-in-the-Loop

State Management Best Practices

1. Idempotency

2. State Persistence

3. Timeout Management

Testing Workflow Agents

Unit Test Individual Steps

Integration Test Full Workflows

Monitoring & Observability

The Bottom Line

Next Steps

About the Author

Related Articles

Scaling from Single Agent to Multi-Agent Orchestration

Testing & Evaluating AI Agent Performance: A Practical Guide

Integrating AI Agents with Your Existing Business Systems