Why Spot Instances Are the Biggest Cost Lever You're Probably Ignoring
If you're running workloads on AWS, Azure, or Google Cloud at full on-demand pricing, you're leaving a lot of money on the table — we're talking 70–90% of your compute bill. Spot instances (called Spot VMs on Azure and GCP) let you tap into unused cloud capacity at steep discounts. The catch? The provider can pull them back with short notice when demand spikes.
That tradeoff used to scare teams away. Fair enough — nobody wants their production workload yanked mid-request.
But here's the thing: in 2026, the tooling has matured dramatically. Karpenter, Terraform mixed-instance policies, capacity-optimized allocation strategies — these aren't experimental anymore. They're battle-tested. According to the 2026 Kubernetes Cost Benchmark Report, clusters mixing on-demand and spot instances see an average 59% cost reduction, while spot-only clusters hit 77% savings. Those numbers are hard to ignore.
So, let's dive in. This guide covers everything you need to start saving: provider-specific pricing mechanics, Terraform configurations for all three clouds, Kubernetes spot node pool strategies with Karpenter, and interruption handling patterns that actually hold up in production.
How Spot Instances Work: A Multi-Cloud Overview
Every major cloud provider has excess compute capacity that fluctuates based on demand. Rather than let it sit idle, they sell it at deeply discounted prices. The fundamental deal is the same across AWS, Azure, and GCP: you get cheap compute, but the provider can take it back when they need it.
The devil is in the details, though. Each cloud handles pricing, interruption notices, and eviction policies differently.
AWS EC2 Spot Instances
- Discount: Up to 90% off on-demand pricing
- Interruption notice: 2 minutes via instance metadata and EventBridge
- Pricing model: Variable market price based on supply and demand, charged per second
- Key feature: Spot Instance Advisor shows interruption frequency by instance type (<5%, 5–10%, 10–15%, 15–20%, >20%)
- 2026 update: EC2 Capacity Manager now includes spot interruption metrics for better visibility
Azure Spot VMs
- Discount: Up to 90% off pay-as-you-go pricing
- Interruption notice: 30 seconds via Azure Metadata Service (yes, that's significantly less than AWS)
- Pricing model: Set a maximum price or accept the variable market price
- Key feature: Eviction policy choice — deallocate (stop but preserve) or delete entirely
- Worth noting: Azure spot VM prices surged 108% from 2022 to 2023, making instance diversification more important than ever
GCP Spot VMs
- Discount: Up to 60–91% off on-demand pricing
- Interruption notice: 30 seconds via metadata server
- Pricing model: Fixed discount based on instance type — not market-based like AWS, which makes costs more predictable
- Key feature: No maximum runtime limit (unlike the old Preemptible VMs that had a 24-hour cap)
- Bright spot: GCP spot VM prices actually decreased by ~26% recently, making them increasingly attractive
Which Workloads Belong on Spot?
Not everything should run on spot. The key question is simple: can your workload tolerate sudden interruption and restart from a checkpoint? If yes, you're probably leaving money on the table by not using spot already.
Ideal Spot Workloads
- CI/CD pipelines: Build and test jobs are inherently stateless and retryable — honestly, this is the easiest win
- Batch processing: ETL jobs, data transformations, video encoding — anything with natural checkpoints
- Stateless microservices: Containerized services behind load balancers with multiple replicas
- Machine learning training: With checkpoint-based training, interrupted epochs can resume where they left off
- Dev/test environments: Non-production workloads where brief downtime is perfectly acceptable
- Big data and analytics: Spark, Hadoop, and Flink jobs that distribute work across many nodes
Workloads to Keep on On-Demand or Reserved
- Singleton databases: A single-node database without replication can't tolerate interruption. Period.
- Stateful services without HA: Any service where losing one node means losing data or sessions
- Latency-critical APIs with no failover: If you can't absorb a 30-second to 2-minute disruption window, don't risk it
AWS Spot Instances with Terraform: Production-Ready Configs
The recommended AWS approach uses Auto Scaling Groups with mixed instance policies. This gives you automatic diversification across instance types, the capacity-optimized allocation strategy, and seamless on-demand fallback. It's the setup I'd recommend for most teams starting out.
Mixed-Instance Auto Scaling Group
# Launch template for spot-capable instances
resource "aws_launch_template" "workers" {
name_prefix = "spot-workers-"
image_id = var.ami_id
instance_type = "m6i.large"
tag_specifications {
resource_type = "instance"
tags = {
Name = "spot-worker"
Environment = var.environment
ManagedBy = "terraform"
}
}
metadata_options {
http_endpoint = "enabled"
http_tokens = "required"
instance_metadata_tags = "enabled"
}
}
# ASG with mixed on-demand + spot instances
resource "aws_autoscaling_group" "workers" {
name = "${var.environment}-spot-workers"
min_size = var.min_size
max_size = var.max_size
desired_capacity = var.desired_capacity
vpc_zone_identifier = var.private_subnet_ids
mixed_instances_policy {
instances_distribution {
# Keep 1 on-demand instance as baseline
on_demand_base_capacity = 1
# Everything above baseline is spot
on_demand_percentage_above_base_capacity = 0
# Best strategy for low interruptions + good price
spot_allocation_strategy = "price-capacity-optimized"
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.workers.id
version = "$Latest"
}
# Diversify across 6+ instance types to minimize interruptions
override {
instance_type = "m6i.large"
weighted_capacity = "1"
}
override {
instance_type = "m6a.large"
weighted_capacity = "1"
}
override {
instance_type = "m5.large"
weighted_capacity = "1"
}
override {
instance_type = "m5a.large"
weighted_capacity = "1"
}
override {
instance_type = "c6i.large"
weighted_capacity = "1"
}
override {
instance_type = "c6a.large"
weighted_capacity = "1"
}
}
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 80
}
}
}
The price-capacity-optimized allocation strategy is the one you want in 2026. It picks instances from the deepest capacity pools at the lowest price, and AWS reports interruption rates up to six times lower than the legacy lowest-price strategy. That's a massive difference in practice.
Spot Interruption Alert with CloudWatch
# Capture spot interruption warnings via EventBridge
resource "aws_cloudwatch_event_rule" "spot_interruption" {
name = "spot-interruption-warning"
description = "Fires when EC2 issues a spot interruption notice"
event_pattern = jsonencode({
source = ["aws.ec2"]
detail-type = ["EC2 Spot Instance Interruption Warning"]
})
}
resource "aws_cloudwatch_event_target" "notify_sns" {
rule = aws_cloudwatch_event_rule.spot_interruption.name
target_id = "spot-alert"
arn = aws_sns_topic.spot_alerts.arn
}
resource "aws_sns_topic" "spot_alerts" {
name = "${var.environment}-spot-interruption-alerts"
}
Azure Spot VMs with Terraform
Azure handles spot differently from AWS — you set a maximum price and choose an eviction policy. The 30-second interruption window (compared to AWS's 2 minutes) means your shutdown logic needs to be faster and leaner. Here's a production-ready config.
resource "azurerm_linux_virtual_machine_scale_set" "spot_workers" {
name = "${var.environment}-spot-vmss"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
sku = "Standard_D2s_v5"
instances = var.desired_capacity
# Spot configuration
priority = "Spot"
eviction_policy = "Deallocate"
max_bid_price = -1 # Pay up to on-demand price
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts"
version = "latest"
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
network_interface {
name = "spot-nic"
primary = true
ip_configuration {
name = "internal"
primary = true
subnet_id = var.subnet_id
}
}
automatic_instance_repair {
enabled = true
grace_period = "PT10M"
}
tags = {
Environment = var.environment
SpotType = "vmss-managed"
}
}
# Autoscale to replace evicted instances
resource "azurerm_monitor_autoscale_setting" "spot_scale" {
name = "${var.environment}-spot-autoscale"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
target_resource_id = azurerm_linux_virtual_machine_scale_set.spot_workers.id
profile {
name = "default"
capacity {
default = var.desired_capacity
minimum = var.min_size
maximum = var.max_size
}
rule {
metric_trigger {
metric_name = "Percentage CPU"
metric_resource_id = azurerm_linux_virtual_machine_scale_set.spot_workers.id
operator = "GreaterThan"
threshold = 70
time_aggregation = "Average"
time_grain = "PT1M"
time_window = "PT5M"
statistic = "Average"
}
scale_action {
direction = "Increase"
type = "ChangeCount"
value = "2"
cooldown = "PT5M"
}
}
}
}
Setting max_bid_price = -1 means you'll pay up to the on-demand price, which keeps you in the spot pool as long as possible. The Deallocate eviction policy is worth using because it preserves your VM's disks and networking config — so it can restart quickly when capacity comes back.
GCP Spot VMs with Terraform
Google Cloud's spot model is refreshingly simple compared to AWS and Azure. Pricing is a fixed discount rather than a market-based auction, so you won't wake up to unexpected cost spikes. That predictability is genuinely nice.
resource "google_compute_instance_template" "spot_workers" {
name_prefix = "spot-worker-"
machine_type = "e2-standard-4"
region = var.region
disk {
source_image = "debian-cloud/debian-12"
auto_delete = true
boot = true
disk_type = "pd-standard"
}
network_interface {
network = var.network
subnetwork = var.subnetwork
}
scheduling {
preemptible = true
automatic_restart = false
provisioning_model = "SPOT"
instance_termination_action = "STOP"
}
labels = {
environment = var.environment
spot = "true"
}
lifecycle {
create_before_destroy = true
}
}
resource "google_compute_instance_group_manager" "spot_workers" {
name = "${var.environment}-spot-mig"
base_instance_name = "spot-worker"
zone = var.zone
target_size = var.desired_capacity
version {
instance_template = google_compute_instance_template.spot_workers.id
}
auto_healing_policies {
health_check = google_compute_health_check.spot_hc.id
initial_delay_sec = 120
}
}
resource "google_compute_health_check" "spot_hc" {
name = "${var.environment}-spot-health-check"
check_interval_sec = 10
timeout_sec = 5
healthy_threshold = 2
unhealthy_threshold = 3
http_health_check {
port = 8080
}
}
Setting instance_termination_action = "STOP" preserves the VM disk on eviction, which means a faster restart. For truly ephemeral workloads where you don't care about the disk, use "DELETE" instead.
Spot Instances in Kubernetes: Karpenter, EKS, AKS, and GKE
Kubernetes is honestly the ideal platform for spot instances. Containers are ephemeral by nature, and Kubernetes' built-in scheduling, replication, and self-healing make it a natural fit for absorbing spot interruptions without drama.
Here's how to set up spot node pools across all three managed Kubernetes services.
AWS EKS with Karpenter (Recommended)
Karpenter is the node autoscaler you should be using on EKS in 2026. Unlike Cluster Autoscaler — which works through pre-configured Auto Scaling Groups — Karpenter provisions nodes directly via the EC2 Fleet API. It picks optimal instance types, availability zones, and purchase options in real time. The difference in efficiency is noticeable.
# Karpenter NodePool: prioritize spot, fallback to on-demand
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-workers
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"]
- key: karpenter.k8s.aws/instance-cpu
operator: In
values: ["2", "4", "8"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: "100"
memory: 400Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 60s
weight: 80 # Higher weight = higher priority
---
# Fallback on-demand NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: on-demand-fallback
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: "20"
memory: 80Gi
weight: 20 # Lower weight = lower priority (fallback)
A few things worth highlighting:
- The
weightattribute controls priority — Karpenter tries the higher-weight NodePool first (spot), then falls back to the lower-weight one (on-demand) when spot capacity runs dry - Allowing multiple instance categories (
m,c,r) and generations (>4) gives Karpenter a broad pool to choose from, which significantly reduces interruptions - You'll want to enable SQS-based interruption handling with the
--interruption-queueCLI argument — this lets Karpenter proactively drain nodes before they're reclaimed
Azure AKS Spot Node Pool
# Create AKS spot node pool via Azure CLI
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name spotnodepool \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1 \
--node-count 3 \
--min-count 1 \
--max-count 10 \
--enable-cluster-autoscaler \
--node-vm-size Standard_D4s_v5 \
--labels workload-type=spot \
--node-taints "kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
GCP GKE Spot Node Pool
# Create GKE spot node pool via gcloud
gcloud container node-pools create spot-pool \
--cluster=my-gke-cluster \
--zone=us-central1-a \
--spot \
--num-nodes=3 \
--min-nodes=1 \
--max-nodes=10 \
--enable-autoscaling \
--machine-type=e2-standard-4 \
--node-labels=workload-type=spot \
--node-taints="cloud.google.com/gke-spot=true:NoSchedule"
Scheduling Workloads to Spot Nodes
Once your spot node pools are running, you need to tell Kubernetes which workloads should land on spot nodes and which shouldn't. Tolerations and affinities are your tools here.
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
spec:
replicas: 5
selector:
matchLabels:
app: batch-processor
template:
metadata:
labels:
app: batch-processor
spec:
# Tolerate the spot taint
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
# Prefer spot nodes but don't require them
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 90
preference:
matchExpressions:
- key: workload-type
operator: In
values:
- spot
containers:
- name: processor
image: myregistry/batch-processor:latest
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
terminationGracePeriodSeconds: 90
Handling Spot Interruptions Gracefully
This is where the rubber meets the road. The difference between a smooth spot experience and a painful one comes down entirely to how you handle interruptions. Get this wrong and your team will swear off spot instances forever. Get it right, and they'll wonder why you didn't switch sooner.
1. Pod Disruption Budgets
Always — and I mean always — set PDBs for services running on spot nodes. This prevents Kubernetes from evicting too many pods simultaneously during a mass reclaim event.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: batch-processor-pdb
spec:
minAvailable: 3
selector:
matchLabels:
app: batch-processor
2. Graceful Shutdown with preStop Hooks
Use preStop lifecycle hooks to flush state, drain connections, or save checkpoints before the pod gets terminated. This small addition can be the difference between lost work and a clean handoff.
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
echo "Spot interruption detected, draining..."
# Flush in-progress work to queue
/app/drain-to-queue.sh
# Allow load balancer to deregister
sleep 15
3. AWS Node Termination Handler
For EKS clusters not using Karpenter, the AWS Node Termination Handler (NTH) watches for spot interruption notices and automatically cordons and drains nodes. It's straightforward to set up via Helm.
# Install NTH via Helm
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler \
eks/aws-node-termination-handler \
--namespace kube-system \
--set enableSpotInterruptionDraining=true \
--set enableRebalanceMonitoring=true \
--set enableScheduledEventDraining=true
Quick heads-up: Don't run NTH alongside Karpenter's native interruption handling. They conflict with each other. Pick one.
4. Design for Statelessness
The single most impactful thing you can do for spot reliability is designing your applications to be stateless. Everything else is a band-aid if your app can't survive losing its host at any moment.
- Ship logs immediately to centralized logging (CloudWatch, Datadog) — don't batch on the instance
- Store sessions in Redis, Memcached, or DynamoDB — never on local disk
- Use S3, Azure Blob, or GCS for persistent data — process from object storage
- Make all operations idempotent so work can safely be retried after interruption
The Layered Pricing Model: A Spot-First Strategy
The most cost-effective approach doesn't go all-in on spot. Instead, it combines multiple pricing models in layers. This is the strategy mature FinOps teams use, and it consistently delivers the best results.
The Three-Layer Model
- Layer 1 — Savings Plans / Reserved Instances (40–60% of compute): Cover your predictable, always-on baseline with 1-year Compute Savings Plans (no upfront). You'll get 30–50% off on-demand with full flexibility across instance families.
- Layer 2 — Spot Instances (30–40% of compute): Run all fault-tolerant, horizontally scalable workloads on spot. Diversify instance pools and use the price-capacity-optimized strategy. This layer saves 70–90% off on-demand.
- Layer 3 — On-Demand (10–20% of compute): Keep on-demand for workloads that can't tolerate interruption and don't have predictable enough usage for commitments. Think of this as your safety net.
Blended Savings Example
Let's put real numbers on this. Consider a workload spending $100,000/month on on-demand compute:
| Layer | Share | Discount | Monthly Cost |
|---|---|---|---|
| Savings Plans | 50% | 40% off | $30,000 |
| Spot Instances | 35% | 80% off | $7,000 |
| On-Demand | 15% | 0% | $15,000 |
| Total | 100% | 48% blended | $52,000 |
That's $48,000/month saved — $576,000 annually — without changing a single line of application code. The savings come entirely from smarter infrastructure purchasing. (And honestly, that $576K number tends to get people's attention in budget meetings pretty fast.)
Monitoring and Optimizing Your Spot Fleet
Running on spot isn't a set-it-and-forget-it situation. Continuous monitoring keeps your savings intact and disruptions to a minimum.
Key Metrics to Track
- Spot interruption rate: Track by instance type, AZ, and time of day. If interruptions exceed 10%, it's time to diversify further.
- Spot vs. on-demand ratio: Aim for 60–80% spot in non-critical workloads. Watch for drift.
- Fallback frequency: How often workloads fall back to on-demand. High fallback rates quietly erode your savings.
- Pod rescheduling latency: Time from interruption to pod running on a new node. You want this under 60 seconds.
AWS Cost Explorer Spot Filter
# View spot vs on-demand spending over the last 30 days
aws ce get-cost-and-usage \
--time-period Start=2026-02-15,End=2026-03-15 \
--granularity MONTHLY \
--metrics "BlendedCost" "UnblendedCost" "UsageQuantity" \
--group-by Type=DIMENSION,Key=PURCHASE_TYPE \
--filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Elastic Compute Cloud - Compute"]}}'
Kubernetes Spot Node Monitoring with Prometheus
# PromQL: percentage of pods running on spot nodes
sum(kube_pod_info{node=~".*spot.*"}) /
sum(kube_pod_info) * 100
# PromQL: node evictions from spot interruptions (last 24h)
sum(increase(kube_pod_container_status_restarts_total{
reason="Evicted"
}[24h]))
Frequently Asked Questions
How much can you actually save with spot instances?
Real-world savings range from 60% to 90% off on-demand pricing, depending on the cloud provider, instance type, and region. AWS and Azure offer up to 90% discounts, while GCP sits in the 60–91% range. When you combine spot with Savings Plans for your baseline compute, most organizations land on a 48–65% blended discount across their entire compute fleet.
Are spot instances reliable enough for production?
Yes — with the right architecture. The average spot interruption rate across all AWS instance types and regions is under 5%. Use the price-capacity-optimized allocation strategy, diversify across 6+ instance types, and that rate drops to around 3%. Kubernetes' built-in replication and self-healing make it especially well-suited for production workloads on spot.
What happens when a spot instance gets interrupted?
On AWS, you get a 2-minute warning via instance metadata and EventBridge. On Azure and GCP, it's 30 seconds. During that window, your application should checkpoint its state, drain connections, and prepare for shutdown. Tools like Karpenter, AWS Node Termination Handler, and Pod Disruption Budgets automate most of this — you just need to configure them properly.
Should I use Karpenter or Cluster Autoscaler for spot on EKS?
Karpenter, hands down. Unlike Cluster Autoscaler — which works through pre-configured Auto Scaling Groups — Karpenter provisions nodes directly through the EC2 Fleet API. This means it can pick optimal instance types, AZs, and pricing in real time. It also handles spot interruptions natively through SQS integration, so you don't need a separate Node Termination Handler.
Can I use spot instances with Terraform across multiple clouds?
Absolutely. Each provider has native Terraform resources for spot: aws_autoscaling_group with mixed_instances_policy for AWS, azurerm_linux_virtual_machine_scale_set with priority = "Spot" for Azure, and google_compute_instance_template with provisioning_model = "SPOT" for GCP. You can manage all three from a single Terraform codebase using separate provider configs and shared modules.