Spot Instances: How to Save Up to 90% on AWS, Azure, and GCP

Save 70–90% on cloud compute with spot instances across AWS, Azure, and GCP. Includes production-ready Terraform configs, Karpenter node pool strategies, and battle-tested interruption handling patterns.

Why Spot Instances Are the Biggest Cost Lever You're Probably Ignoring

If you're running workloads on AWS, Azure, or Google Cloud at full on-demand pricing, you're leaving a lot of money on the table — we're talking 70–90% of your compute bill. Spot instances (called Spot VMs on Azure and GCP) let you tap into unused cloud capacity at steep discounts. The catch? The provider can pull them back with short notice when demand spikes.

That tradeoff used to scare teams away. Fair enough — nobody wants their production workload yanked mid-request.

But here's the thing: in 2026, the tooling has matured dramatically. Karpenter, Terraform mixed-instance policies, capacity-optimized allocation strategies — these aren't experimental anymore. They're battle-tested. According to the 2026 Kubernetes Cost Benchmark Report, clusters mixing on-demand and spot instances see an average 59% cost reduction, while spot-only clusters hit 77% savings. Those numbers are hard to ignore.

So, let's dive in. This guide covers everything you need to start saving: provider-specific pricing mechanics, Terraform configurations for all three clouds, Kubernetes spot node pool strategies with Karpenter, and interruption handling patterns that actually hold up in production.

How Spot Instances Work: A Multi-Cloud Overview

Every major cloud provider has excess compute capacity that fluctuates based on demand. Rather than let it sit idle, they sell it at deeply discounted prices. The fundamental deal is the same across AWS, Azure, and GCP: you get cheap compute, but the provider can take it back when they need it.

The devil is in the details, though. Each cloud handles pricing, interruption notices, and eviction policies differently.

AWS EC2 Spot Instances

  • Discount: Up to 90% off on-demand pricing
  • Interruption notice: 2 minutes via instance metadata and EventBridge
  • Pricing model: Variable market price based on supply and demand, charged per second
  • Key feature: Spot Instance Advisor shows interruption frequency by instance type (<5%, 5–10%, 10–15%, 15–20%, >20%)
  • 2026 update: EC2 Capacity Manager now includes spot interruption metrics for better visibility

Azure Spot VMs

  • Discount: Up to 90% off pay-as-you-go pricing
  • Interruption notice: 30 seconds via Azure Metadata Service (yes, that's significantly less than AWS)
  • Pricing model: Set a maximum price or accept the variable market price
  • Key feature: Eviction policy choice — deallocate (stop but preserve) or delete entirely
  • Worth noting: Azure spot VM prices surged 108% from 2022 to 2023, making instance diversification more important than ever

GCP Spot VMs

  • Discount: Up to 60–91% off on-demand pricing
  • Interruption notice: 30 seconds via metadata server
  • Pricing model: Fixed discount based on instance type — not market-based like AWS, which makes costs more predictable
  • Key feature: No maximum runtime limit (unlike the old Preemptible VMs that had a 24-hour cap)
  • Bright spot: GCP spot VM prices actually decreased by ~26% recently, making them increasingly attractive

Which Workloads Belong on Spot?

Not everything should run on spot. The key question is simple: can your workload tolerate sudden interruption and restart from a checkpoint? If yes, you're probably leaving money on the table by not using spot already.

Ideal Spot Workloads

  • CI/CD pipelines: Build and test jobs are inherently stateless and retryable — honestly, this is the easiest win
  • Batch processing: ETL jobs, data transformations, video encoding — anything with natural checkpoints
  • Stateless microservices: Containerized services behind load balancers with multiple replicas
  • Machine learning training: With checkpoint-based training, interrupted epochs can resume where they left off
  • Dev/test environments: Non-production workloads where brief downtime is perfectly acceptable
  • Big data and analytics: Spark, Hadoop, and Flink jobs that distribute work across many nodes

Workloads to Keep on On-Demand or Reserved

  • Singleton databases: A single-node database without replication can't tolerate interruption. Period.
  • Stateful services without HA: Any service where losing one node means losing data or sessions
  • Latency-critical APIs with no failover: If you can't absorb a 30-second to 2-minute disruption window, don't risk it

AWS Spot Instances with Terraform: Production-Ready Configs

The recommended AWS approach uses Auto Scaling Groups with mixed instance policies. This gives you automatic diversification across instance types, the capacity-optimized allocation strategy, and seamless on-demand fallback. It's the setup I'd recommend for most teams starting out.

Mixed-Instance Auto Scaling Group

# Launch template for spot-capable instances
resource "aws_launch_template" "workers" {
  name_prefix   = "spot-workers-"
  image_id      = var.ami_id
  instance_type = "m6i.large"

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name        = "spot-worker"
      Environment = var.environment
      ManagedBy   = "terraform"
    }
  }

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    instance_metadata_tags      = "enabled"
  }
}

# ASG with mixed on-demand + spot instances
resource "aws_autoscaling_group" "workers" {
  name                = "${var.environment}-spot-workers"
  min_size            = var.min_size
  max_size            = var.max_size
  desired_capacity    = var.desired_capacity
  vpc_zone_identifier = var.private_subnet_ids

  mixed_instances_policy {
    instances_distribution {
      # Keep 1 on-demand instance as baseline
      on_demand_base_capacity                  = 1
      # Everything above baseline is spot
      on_demand_percentage_above_base_capacity = 0
      # Best strategy for low interruptions + good price
      spot_allocation_strategy                 = "price-capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.workers.id
        version            = "$Latest"
      }

      # Diversify across 6+ instance types to minimize interruptions
      override {
        instance_type     = "m6i.large"
        weighted_capacity = "1"
      }
      override {
        instance_type     = "m6a.large"
        weighted_capacity = "1"
      }
      override {
        instance_type     = "m5.large"
        weighted_capacity = "1"
      }
      override {
        instance_type     = "m5a.large"
        weighted_capacity = "1"
      }
      override {
        instance_type     = "c6i.large"
        weighted_capacity = "1"
      }
      override {
        instance_type     = "c6a.large"
        weighted_capacity = "1"
      }
    }
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 80
    }
  }
}

The price-capacity-optimized allocation strategy is the one you want in 2026. It picks instances from the deepest capacity pools at the lowest price, and AWS reports interruption rates up to six times lower than the legacy lowest-price strategy. That's a massive difference in practice.

Spot Interruption Alert with CloudWatch

# Capture spot interruption warnings via EventBridge
resource "aws_cloudwatch_event_rule" "spot_interruption" {
  name        = "spot-interruption-warning"
  description = "Fires when EC2 issues a spot interruption notice"

  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = ["EC2 Spot Instance Interruption Warning"]
  })
}

resource "aws_cloudwatch_event_target" "notify_sns" {
  rule      = aws_cloudwatch_event_rule.spot_interruption.name
  target_id = "spot-alert"
  arn       = aws_sns_topic.spot_alerts.arn
}

resource "aws_sns_topic" "spot_alerts" {
  name = "${var.environment}-spot-interruption-alerts"
}

Azure Spot VMs with Terraform

Azure handles spot differently from AWS — you set a maximum price and choose an eviction policy. The 30-second interruption window (compared to AWS's 2 minutes) means your shutdown logic needs to be faster and leaner. Here's a production-ready config.

resource "azurerm_linux_virtual_machine_scale_set" "spot_workers" {
  name                = "${var.environment}-spot-vmss"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  sku                 = "Standard_D2s_v5"
  instances           = var.desired_capacity

  # Spot configuration
  priority        = "Spot"
  eviction_policy = "Deallocate"
  max_bid_price   = -1  # Pay up to on-demand price

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  network_interface {
    name    = "spot-nic"
    primary = true

    ip_configuration {
      name      = "internal"
      primary   = true
      subnet_id = var.subnet_id
    }
  }

  automatic_instance_repair {
    enabled      = true
    grace_period = "PT10M"
  }

  tags = {
    Environment = var.environment
    SpotType    = "vmss-managed"
  }
}

# Autoscale to replace evicted instances
resource "azurerm_monitor_autoscale_setting" "spot_scale" {
  name                = "${var.environment}-spot-autoscale"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  target_resource_id  = azurerm_linux_virtual_machine_scale_set.spot_workers.id

  profile {
    name = "default"

    capacity {
      default = var.desired_capacity
      minimum = var.min_size
      maximum = var.max_size
    }

    rule {
      metric_trigger {
        metric_name        = "Percentage CPU"
        metric_resource_id = azurerm_linux_virtual_machine_scale_set.spot_workers.id
        operator           = "GreaterThan"
        threshold          = 70
        time_aggregation   = "Average"
        time_grain         = "PT1M"
        time_window        = "PT5M"
        statistic          = "Average"
      }
      scale_action {
        direction = "Increase"
        type      = "ChangeCount"
        value     = "2"
        cooldown  = "PT5M"
      }
    }
  }
}

Setting max_bid_price = -1 means you'll pay up to the on-demand price, which keeps you in the spot pool as long as possible. The Deallocate eviction policy is worth using because it preserves your VM's disks and networking config — so it can restart quickly when capacity comes back.

GCP Spot VMs with Terraform

Google Cloud's spot model is refreshingly simple compared to AWS and Azure. Pricing is a fixed discount rather than a market-based auction, so you won't wake up to unexpected cost spikes. That predictability is genuinely nice.

resource "google_compute_instance_template" "spot_workers" {
  name_prefix  = "spot-worker-"
  machine_type = "e2-standard-4"
  region       = var.region

  disk {
    source_image = "debian-cloud/debian-12"
    auto_delete  = true
    boot         = true
    disk_type    = "pd-standard"
  }

  network_interface {
    network    = var.network
    subnetwork = var.subnetwork
  }

  scheduling {
    preemptible                 = true
    automatic_restart           = false
    provisioning_model          = "SPOT"
    instance_termination_action = "STOP"
  }

  labels = {
    environment = var.environment
    spot        = "true"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "google_compute_instance_group_manager" "spot_workers" {
  name               = "${var.environment}-spot-mig"
  base_instance_name = "spot-worker"
  zone               = var.zone
  target_size        = var.desired_capacity

  version {
    instance_template = google_compute_instance_template.spot_workers.id
  }

  auto_healing_policies {
    health_check      = google_compute_health_check.spot_hc.id
    initial_delay_sec = 120
  }
}

resource "google_compute_health_check" "spot_hc" {
  name                = "${var.environment}-spot-health-check"
  check_interval_sec  = 10
  timeout_sec         = 5
  healthy_threshold   = 2
  unhealthy_threshold = 3

  http_health_check {
    port = 8080
  }
}

Setting instance_termination_action = "STOP" preserves the VM disk on eviction, which means a faster restart. For truly ephemeral workloads where you don't care about the disk, use "DELETE" instead.

Spot Instances in Kubernetes: Karpenter, EKS, AKS, and GKE

Kubernetes is honestly the ideal platform for spot instances. Containers are ephemeral by nature, and Kubernetes' built-in scheduling, replication, and self-healing make it a natural fit for absorbing spot interruptions without drama.

Here's how to set up spot node pools across all three managed Kubernetes services.

AWS EKS with Karpenter (Recommended)

Karpenter is the node autoscaler you should be using on EKS in 2026. Unlike Cluster Autoscaler — which works through pre-configured Auto Scaling Groups — Karpenter provisions nodes directly via the EC2 Fleet API. It picks optimal instance types, availability zones, and purchase options in real time. The difference in efficiency is noticeable.

# Karpenter NodePool: prioritize spot, fallback to on-demand
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-workers
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]
        - key: karpenter.k8s.aws/instance-cpu
          operator: In
          values: ["2", "4", "8"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "100"
    memory: 400Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 60s
  weight: 80  # Higher weight = higher priority
---
# Fallback on-demand NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: on-demand-fallback
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "20"
    memory: 80Gi
  weight: 20  # Lower weight = lower priority (fallback)

A few things worth highlighting:

  • The weight attribute controls priority — Karpenter tries the higher-weight NodePool first (spot), then falls back to the lower-weight one (on-demand) when spot capacity runs dry
  • Allowing multiple instance categories (m, c, r) and generations (>4) gives Karpenter a broad pool to choose from, which significantly reduces interruptions
  • You'll want to enable SQS-based interruption handling with the --interruption-queue CLI argument — this lets Karpenter proactively drain nodes before they're reclaimed

Azure AKS Spot Node Pool

# Create AKS spot node pool via Azure CLI
az aks nodepool add \
  --resource-group myResourceGroup \
  --cluster-name myAKSCluster \
  --name spotnodepool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --node-count 3 \
  --min-count 1 \
  --max-count 10 \
  --enable-cluster-autoscaler \
  --node-vm-size Standard_D4s_v5 \
  --labels workload-type=spot \
  --node-taints "kubernetes.azure.com/scalesetpriority=spot:NoSchedule"

GCP GKE Spot Node Pool

# Create GKE spot node pool via gcloud
gcloud container node-pools create spot-pool \
  --cluster=my-gke-cluster \
  --zone=us-central1-a \
  --spot \
  --num-nodes=3 \
  --min-nodes=1 \
  --max-nodes=10 \
  --enable-autoscaling \
  --machine-type=e2-standard-4 \
  --node-labels=workload-type=spot \
  --node-taints="cloud.google.com/gke-spot=true:NoSchedule"

Scheduling Workloads to Spot Nodes

Once your spot node pools are running, you need to tell Kubernetes which workloads should land on spot nodes and which shouldn't. Tolerations and affinities are your tools here.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 5
  selector:
    matchLabels:
      app: batch-processor
  template:
    metadata:
      labels:
        app: batch-processor
    spec:
      # Tolerate the spot taint
      tolerations:
        - key: "kubernetes.azure.com/scalesetpriority"
          operator: "Equal"
          value: "spot"
          effect: "NoSchedule"
      # Prefer spot nodes but don't require them
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 90
              preference:
                matchExpressions:
                  - key: workload-type
                    operator: In
                    values:
                      - spot
      containers:
        - name: processor
          image: myregistry/batch-processor:latest
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1"
              memory: "1Gi"
      terminationGracePeriodSeconds: 90

Handling Spot Interruptions Gracefully

This is where the rubber meets the road. The difference between a smooth spot experience and a painful one comes down entirely to how you handle interruptions. Get this wrong and your team will swear off spot instances forever. Get it right, and they'll wonder why you didn't switch sooner.

1. Pod Disruption Budgets

Always — and I mean always — set PDBs for services running on spot nodes. This prevents Kubernetes from evicting too many pods simultaneously during a mass reclaim event.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: batch-processor-pdb
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: batch-processor

2. Graceful Shutdown with preStop Hooks

Use preStop lifecycle hooks to flush state, drain connections, or save checkpoints before the pod gets terminated. This small addition can be the difference between lost work and a clean handoff.

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - |
          echo "Spot interruption detected, draining..."
          # Flush in-progress work to queue
          /app/drain-to-queue.sh
          # Allow load balancer to deregister
          sleep 15

3. AWS Node Termination Handler

For EKS clusters not using Karpenter, the AWS Node Termination Handler (NTH) watches for spot interruption notices and automatically cordons and drains nodes. It's straightforward to set up via Helm.

# Install NTH via Helm
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler \
  eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableRebalanceMonitoring=true \
  --set enableScheduledEventDraining=true

Quick heads-up: Don't run NTH alongside Karpenter's native interruption handling. They conflict with each other. Pick one.

4. Design for Statelessness

The single most impactful thing you can do for spot reliability is designing your applications to be stateless. Everything else is a band-aid if your app can't survive losing its host at any moment.

  • Ship logs immediately to centralized logging (CloudWatch, Datadog) — don't batch on the instance
  • Store sessions in Redis, Memcached, or DynamoDB — never on local disk
  • Use S3, Azure Blob, or GCS for persistent data — process from object storage
  • Make all operations idempotent so work can safely be retried after interruption

The Layered Pricing Model: A Spot-First Strategy

The most cost-effective approach doesn't go all-in on spot. Instead, it combines multiple pricing models in layers. This is the strategy mature FinOps teams use, and it consistently delivers the best results.

The Three-Layer Model

  1. Layer 1 — Savings Plans / Reserved Instances (40–60% of compute): Cover your predictable, always-on baseline with 1-year Compute Savings Plans (no upfront). You'll get 30–50% off on-demand with full flexibility across instance families.
  2. Layer 2 — Spot Instances (30–40% of compute): Run all fault-tolerant, horizontally scalable workloads on spot. Diversify instance pools and use the price-capacity-optimized strategy. This layer saves 70–90% off on-demand.
  3. Layer 3 — On-Demand (10–20% of compute): Keep on-demand for workloads that can't tolerate interruption and don't have predictable enough usage for commitments. Think of this as your safety net.

Blended Savings Example

Let's put real numbers on this. Consider a workload spending $100,000/month on on-demand compute:

LayerShareDiscountMonthly Cost
Savings Plans50%40% off$30,000
Spot Instances35%80% off$7,000
On-Demand15%0%$15,000
Total100%48% blended$52,000

That's $48,000/month saved — $576,000 annually — without changing a single line of application code. The savings come entirely from smarter infrastructure purchasing. (And honestly, that $576K number tends to get people's attention in budget meetings pretty fast.)

Monitoring and Optimizing Your Spot Fleet

Running on spot isn't a set-it-and-forget-it situation. Continuous monitoring keeps your savings intact and disruptions to a minimum.

Key Metrics to Track

  • Spot interruption rate: Track by instance type, AZ, and time of day. If interruptions exceed 10%, it's time to diversify further.
  • Spot vs. on-demand ratio: Aim for 60–80% spot in non-critical workloads. Watch for drift.
  • Fallback frequency: How often workloads fall back to on-demand. High fallback rates quietly erode your savings.
  • Pod rescheduling latency: Time from interruption to pod running on a new node. You want this under 60 seconds.

AWS Cost Explorer Spot Filter

# View spot vs on-demand spending over the last 30 days
aws ce get-cost-and-usage \
  --time-period Start=2026-02-15,End=2026-03-15 \
  --granularity MONTHLY \
  --metrics "BlendedCost" "UnblendedCost" "UsageQuantity" \
  --group-by Type=DIMENSION,Key=PURCHASE_TYPE \
  --filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Elastic Compute Cloud - Compute"]}}'

Kubernetes Spot Node Monitoring with Prometheus

# PromQL: percentage of pods running on spot nodes
sum(kube_pod_info{node=~".*spot.*"}) /
sum(kube_pod_info) * 100

# PromQL: node evictions from spot interruptions (last 24h)
sum(increase(kube_pod_container_status_restarts_total{
  reason="Evicted"
}[24h]))

Frequently Asked Questions

How much can you actually save with spot instances?

Real-world savings range from 60% to 90% off on-demand pricing, depending on the cloud provider, instance type, and region. AWS and Azure offer up to 90% discounts, while GCP sits in the 60–91% range. When you combine spot with Savings Plans for your baseline compute, most organizations land on a 48–65% blended discount across their entire compute fleet.

Are spot instances reliable enough for production?

Yes — with the right architecture. The average spot interruption rate across all AWS instance types and regions is under 5%. Use the price-capacity-optimized allocation strategy, diversify across 6+ instance types, and that rate drops to around 3%. Kubernetes' built-in replication and self-healing make it especially well-suited for production workloads on spot.

What happens when a spot instance gets interrupted?

On AWS, you get a 2-minute warning via instance metadata and EventBridge. On Azure and GCP, it's 30 seconds. During that window, your application should checkpoint its state, drain connections, and prepare for shutdown. Tools like Karpenter, AWS Node Termination Handler, and Pod Disruption Budgets automate most of this — you just need to configure them properly.

Should I use Karpenter or Cluster Autoscaler for spot on EKS?

Karpenter, hands down. Unlike Cluster Autoscaler — which works through pre-configured Auto Scaling Groups — Karpenter provisions nodes directly through the EC2 Fleet API. This means it can pick optimal instance types, AZs, and pricing in real time. It also handles spot interruptions natively through SQS integration, so you don't need a separate Node Termination Handler.

Can I use spot instances with Terraform across multiple clouds?

Absolutely. Each provider has native Terraform resources for spot: aws_autoscaling_group with mixed_instances_policy for AWS, azurerm_linux_virtual_machine_scale_set with priority = "Spot" for Azure, and google_compute_instance_template with provisioning_model = "SPOT" for GCP. You can manage all three from a single Terraform codebase using separate provider configs and shared modules.

About the Author Editorial Team

Our team of expert writers and editors.