AWS Graviton Migration: Save 40% in 2026

Look, if you're still running x86 EC2 instances in 2026, you're almost certainly leaving money on the table. AWS Graviton4 processors — Amazon's fourth-generation ARM-based chips — deliver up to 40% better price-performance than comparable x86 instances, and honestly, the migration path has never been smoother. Most Linux workloads now "just work" on ARM64, and the ecosystem has matured to the point where sticking with x86 requires justification, not the other way around.

I've personally migrated three large fleets (one was a 4,000-pod EKS cluster handling checkout traffic) and the biggest surprise every single time was how boring the cutover actually was. No fireworks. Just a slow, steady drop in the monthly bill.

This guide walks through a production-grade Graviton migration: how to verify workload compatibility, benchmark honestly, migrate with zero downtime, and quantify the savings. Every code example here has been tested on current tooling as of April 2026.

What Is AWS Graviton and Why Does It Save Money?

AWS Graviton is a family of 64-bit ARM Neoverse-based processors designed in-house by AWS's Annapurna Labs. Because AWS skips the Intel/AMD licensing layer and designs the silicon specifically for its own hypervisor and datacenter, it can pass those savings straight back to customers. Vertical integration, basically.

Graviton2 (launched 2020, M6g/C6g/R6g) — baseline 20% cheaper than x86 M5.
Graviton3 (2022, M7g/C7g/R7g) — 25% faster than Graviton2, DDR5, SVE vector extensions.
Graviton4 (GA 2024, now generally available across regions in 2026 on M8g/C8g/R8g/X8g) — 30% better compute, 40% better vector, 75% more memory bandwidth versus Graviton3, with up to 192 vCPUs per instance.

The headline number most teams see after migration: 15–20% list-price discount versus equivalent x86 instances, plus a 10–30% performance improvement on typical web/API workloads. That stacks to roughly 30–40% effective savings on compute spend. Stack that on top of a 3-year Savings Plan and, well, you're looking at 60%+ off on-demand x86 pricing.

Where Graviton4 Actually Wins

Price-performance gains vary dramatically by workload — and anyone telling you otherwise is waving a marketing slide around. Based on 2026 benchmarks across thousands of production workloads:

Java / JVM services: 20–40% better throughput (OpenJDK 21+ has excellent ARM64 codegen).
Nginx / Envoy / HAProxy: 30–50% higher requests per second per dollar.
Go services: 15–25% better performance, trivial to cross-compile.
Node.js: 20–30% better throughput on V8 workloads.
Redis / Memcached / KeyDB: 30–40% more ops/sec per dollar thanks to memory bandwidth.
PostgreSQL / MySQL: 20–35% more TPS; RDS Graviton4 instances are essentially free money.
Python / Ruby: 10–20% better — smaller gains because interpreters are less optimized for SVE.
ML inference (small/medium models): Graviton4 with SVE2 can rival inf2 for many NLP workloads at far lower cost.

Step 1: Identify Migration Candidates

Before you touch anything, find your highest-ROI targets. Use Cost Explorer to surface the top x86 instance spend:

aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-04-01 \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE \
  --filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Elastic Compute Cloud - Compute"]}}' \
  --query 'ResultsByTime[0].Groups[?starts_with(Keys[0], `m5`) || starts_with(Keys[0], `c5`) || starts_with(Keys[0], `r5`) || starts_with(Keys[0], `m6i`) || starts_with(Keys[0], `c6i`) || starts_with(Keys[0], `r6i`) || starts_with(Keys[0], `m7i`) || starts_with(Keys[0], `c7i`) || starts_with(Keys[0], `r7i`)].[Keys[0], Metrics.UnblendedCost.Amount]' \
  --output table

Sort descending by cost. Your first migration targets should be:

Stateless web/API fleets behind load balancers — trivial rollback, no data migration.
Managed services (RDS, ElastiCache, OpenSearch, MSK) — AWS does the heavy lifting; you just pick the instance class.
Kubernetes worker pools — mixed-architecture node groups are well-supported.
Batch / async workers — plenty of time to rollback if anything regresses.

Pro tip from experience: don't start with your most critical service. Start with whichever service has the lowest blast radius and the highest spend. That's usually some boring backend worker that nobody pays attention to — which is exactly why it's the right first candidate.

Step 2: Check Binary and Dependency Compatibility

The single biggest reason teams stall on Graviton is a transitive native dependency that only ships x86_64 wheels or binaries. Scan your container images before committing to the migration. Seriously, do this first.

Scanning a Docker Image for Non-ARM Binaries

#!/bin/bash
# scan-arm-compat.sh - extract all ELF binaries from an image and check architecture

IMAGE=$1
WORK=$(mktemp -d)

docker pull --platform linux/amd64 "$IMAGE"
CID=$(docker create --platform linux/amd64 "$IMAGE")
docker export "$CID" | tar -x -C "$WORK"
docker rm "$CID" > /dev/null

echo "Scanning $WORK for ELF binaries..."
find "$WORK" -type f -exec sh -c '
  file "$1" 2>/dev/null | grep -q "ELF" &&
    echo "$(file "$1" | grep -oE "(x86-64|aarch64|ARM|386)"): $1"
' _ {} \; | sort | uniq -c | sort -rn | head -30

rm -rf "$WORK"

Red flags you'll want to resolve:

Proprietary monitoring agents without ARM builds (most major vendors now ship ARM64 — but verify your specific version).
Python packages pinned to old versions without manylinux2014_aarch64 wheels (common culprits: old grpcio, lxml, numpy <1.21).
Node native modules compiled at install time — usually fine if you rebuild on ARM, but breaks if you copy node_modules across architectures.
Java agents (APM, profilers) — most ship ARM builds, but not all JVMTI agents do.
Closed-source ISV software with x86-only licensing.

Using the AWS Graviton Ready Tool

For a more systematic audit, the open-source porting-advisor-for-graviton still works well in 2026 for codebase-level scans:

pip install arm-porting-advisor
porting-advisor ./path/to/source --output report.html

Step 3: Build Multi-Architecture Container Images

The cleanest migration pattern is to build images for both linux/amd64 and linux/arm64, push them under a single tag, and let the container runtime pick the right one. Docker Buildx handles this natively, and it's honestly one of my favorite features of the modern Docker stack.

Buildx Configuration

docker buildx create --name multiarch --driver docker-container --use
docker buildx inspect --bootstrap

docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag 123456789012.dkr.ecr.us-east-1.amazonaws.com/api:v2.14.0 \
  --push \
  .

Dockerfile Patterns That Work on Both Architectures

FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS build
ARG TARGETOS TARGETARCH
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH \
    go build -ldflags="-s -w" -o /out/api ./cmd/api

FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=build /out/api /api
ENTRYPOINT ["/api"]

A couple of key points here. Use --platform=$BUILDPLATFORM on the builder stage so it runs natively (much faster than QEMU emulation — we're talking 10x differences for some builds), and pass $TARGETOS/$TARGETARCH to the compiler. For languages that lack cross-compilation (Python with native extensions, Node native modules), run the build natively on an ARM builder. GitHub Actions now offers free ARM runners, and CodeBuild supports ARM natively.

GitHub Actions Multi-Arch Build

name: build
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-qemu-action@v3
      - uses: docker/setup-buildx-action@v3
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/gh-actions
          aws-region: us-east-1
      - uses: aws-actions/amazon-ecr-login@v2
      - uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: 123456789012.dkr.ecr.us-east-1.amazonaws.com/api:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

Step 4: Migrate EKS Worker Nodes

For Kubernetes, the safest approach is to add an ARM node group alongside the existing x86 group, then drain x86 nodes gradually while watching for pod scheduling failures. Not everything at once. Gradually.

Karpenter NodePool for Graviton4

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: graviton
spec:
  template:
    metadata:
      labels:
        arch: arm64
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: In
          values: ["8"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h
  limits:
    cpu: 2000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Pod Scheduling for Multi-Arch

Pods with multi-arch images can schedule on either architecture. For pods that must run on ARM (to realize savings) or must stay on x86 (incompatible dependency), use nodeAffinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 20
  template:
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              preference:
                matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values: ["arm64"]
      containers:
        - name: api
          image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/api:v2.14.0
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

Use preferred (not required) during the transition, so a burst in pod count doesn't wait on ARM capacity. I learned this one the hard way — a traffic spike during a demo, and our ARM NodePool hit its instance-type limits. Pods sat Pending while x86 capacity was right there, idle. Not a good look.

Step 5: Migrate Managed Services

Managed services are the easiest wins because AWS handles the underlying OS and runtime — you just change the instance class. The savings are 10–20% for the same compute headroom, with zero code changes. It's the closest thing to free money in the cloud.

RDS / Aurora

aws rds modify-db-instance \
  --db-instance-identifier prod-api-db \
  --db-instance-class db.r8g.2xlarge \
  --apply-immediately

Aurora failover during the instance class change takes roughly 30–60 seconds. Schedule it during a maintenance window, or use Blue/Green Deployments for zero-downtime switches.

ElastiCache Redis / Valkey

aws elasticache modify-replication-group \
  --replication-group-id prod-cache \
  --cache-node-type cache.r7g.large \
  --apply-immediately

OpenSearch, MSK, MemoryDB

All three support Graviton-class node types as of 2026. MSK Graviton nodes also support tiered storage, which stacks another cost reduction right on top.

Step 6: Benchmark Honestly Before Rolling Out

Never trust vendor benchmarks. Run your actual traffic against a canary ARM fleet and measure three things: p99 latency, throughput per vCPU, and cost per million requests. That last metric is the one that actually pays your salary.

Load Test Template with k6

import http from 'k6/http';
import { check } from 'k6';

export const options = {
  scenarios: {
    constant_load: {
      executor: 'constant-arrival-rate',
      rate: 1000,
      timeUnit: '1s',
      duration: '15m',
      preAllocatedVUs: 200,
    },
  },
  thresholds: {
    http_req_duration: ['p(99)<250'],
    http_req_failed: ['rate<0.001'],
  },
};

export default function () {
  const res = http.post('https://canary-arm.internal/v1/score', JSON.stringify({
    user_id: Math.floor(Math.random() * 1e6),
    features: [0.1, 0.8, 0.3, 0.9],
  }), { headers: { 'Content-Type': 'application/json' } });
  check(res, { 'status 200': (r) => r.status === 200 });
}

Run the same script against an x86 fleet of equal vCPU count and compare. The metric that matters isn't "ARM is X% faster" — it's cost per successful request at your target p99. Everything else is vanity.

Step 7: Apply Savings Plans to Lock In Discounts

Once you're stable on Graviton, buy a Compute Savings Plan covering your new baseline. Compute Savings Plans apply across instance families and regions, so you're not locked into a specific Graviton generation — when Graviton5 arrives (and it will), your SP coverage moves with you.

aws savingsplans describe-savings-plans-offerings \
  --plan-types Compute \
  --duration-seconds 94608000 \
  --currencies USD \
  --query 'searchResults[?paymentOption==`No Upfront`].{rate:rate,desc:description}' \
  --output table

A 3-year No-Upfront Compute Savings Plan stacks with Graviton list-price savings: roughly 54% off on-demand x86 + 15% Graviton discount = ~60% effective savings on like-for-like workloads. Not bad.

Common Gotchas and How to Avoid Them

Subtle Floating-Point Differences

ARM and x86 produce bit-identical results for IEEE 754 operations in most cases — but fused multiply-add (FMA) and reciprocal-approximation intrinsics can differ in the last bit. If your tests hash float outputs or compare across environments, loosen tolerances or pin to a reproducible math library. (Yes, this will bite an ML team somewhere. It always does.)

Kernel Module Dependencies

Anything that loads a kernel module (some VPN clients, custom storage drivers, older observability agents) needs an ARM64 kernel module. Verify before migration — finding this out after you've drained half your nodes is a bad day.

Non-Homogeneous Cluster Performance

If you run mixed ARM + x86 in a single cluster with shared caching layers, benchmark the end-to-end latency, not just the compute nodes. Sometimes a small tier-3 service running on x86 becomes the bottleneck once the rest of the stack gets faster. Amdahl's law doesn't care about your migration plan.

Lambda Migration Is Even Simpler

Don't overlook Lambda. Switching runtime architecture is a single parameter, and it gives you a 20% price cut:

aws lambda update-function-configuration \
  --function-name process-orders \
  --architectures arm64

For Python and Node.js functions using only standard libraries, no other change is needed. For functions with layers containing native code, rebuild the layer against linux/arm64. That's it.

Measuring the Savings After Migration

Use Cost Explorer with the "Purchase Option" and "Instance Type Family" dimensions to confirm that your Graviton migration is translating into actual dollar savings — not just shifted spend:

aws ce get-cost-and-usage \
  --time-period Start=2026-04-01,End=2026-04-24 \
  --granularity DAILY \
  --metrics UnblendedCost UsageQuantity \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE_FAMILY \
  --filter '{"Dimensions":{"Key":"SERVICE","Values":["Amazon Elastic Compute Cloud - Compute"]}}'

A healthy migration shows x86 family spend declining day-over-day, while total compute spend drops and request volume holds steady or grows. If you don't see that shape on the graph, something's off — usually pod anti-affinity or a NodePool limit quietly preventing the schedule-to-ARM you thought was happening.

FAQ

How much can I actually save by migrating to Graviton?

Real-world production migrations typically land at 20–40% savings on compute line items, depending on workload. The list-price discount is 10–20%, with another 10–25% coming from better performance per vCPU (which lets you run fewer instances). Stacked with a 3-year Compute Savings Plan, effective savings versus on-demand x86 can exceed 60%.

Will my Docker images work on Graviton without changes?

Only if they're built as multi-architecture images or specifically built for linux/arm64. An image built only for linux/amd64 will fail to start on a Graviton instance with an "exec format error." Use docker buildx to publish multi-arch manifests — most modern CI pipelines support this natively.

Is Graviton worth it for small workloads, or should I wait?

Graviton makes sense even for small workloads, because the savings are percentage-based and the migration is a one-time cost. If you're running anything above a t-family instance size in production today, the break-even versus migration engineering time is typically just a few weeks. Start with stateless services where rollback is free.

Does Graviton support the same software as x86?

The open-source ecosystem is now essentially at parity — every major language runtime, database, web server, and observability tool ships ARM64 builds. The remaining gaps are proprietary ISV software (some legacy enterprise applications) and a handful of Python/Node packages that haven't published ARM wheels. The porting-advisor tool or a buildx trial run will surface these in minutes.

What's the difference between Graviton3 and Graviton4, and should I skip straight to Graviton4?

Graviton4 delivers roughly 30% more compute performance and 75% more memory bandwidth than Graviton3, at a similar (or slightly higher) list price. For memory-bound workloads — databases, caches, analytics — Graviton4 is almost always the right choice in 2026. For CPU-bound workloads at small sizes, Graviton3 (7g family) may still be marginally cheaper per vCPU, so check your specific region's pricing. New workloads should default to 8g instances.

Can I run Windows workloads on Graviton?

Not today. AWS Graviton instances run Linux only. Windows workloads stay on x86. If Windows-on-ARM ever lands on EC2, AWS will announce it — but don't plan migrations around that possibility yet.

Next Steps

Here's my advice: pick one stateless service this week. Just one. Build it as a multi-arch image, deploy a 10% ARM canary behind your existing load balancer, and watch the metrics for 48 hours. If p99 latency is flat or better and error rate is unchanged, shift to 100% and move to the next service.

Compound that process across your fleet over a quarter, and you'll take 30%+ off your EC2 bill without any architectural rewrite. That's a line item that'll make your CFO genuinely happy — and those are rare.