Introduction: The Hidden Drain on Your Cloud Budget
Storage is the quiet budget killer in cloud computing. While everyone's obsessing over compute instance costs and GPU pricing, cloud storage spending silently balloons month after month — driven by data that grows relentlessly and almost never gets deleted. With public cloud spending projected to exceed $1 trillion in 2026, storage accounts for a bigger share than most teams realize. And here's the kicker: an estimated 45–55% of that storage spending is ripe for optimization.
Here's the uncomfortable truth: most organizations are storing the vast majority of their data in the most expensive storage tier available.
Access patterns consistently show that 60–80% of stored data is rarely or never accessed after the first 30 days, yet it sits in premium hot storage tiers racking up charges at the highest rate. Toss in orphaned snapshots, forgotten backups, incomplete multipart uploads, and version stacking, and you've got a recipe for thousands — sometimes hundreds of thousands — of dollars in completely avoidable monthly spend.
This guide walks through every major cloud storage cost optimization strategy across AWS S3, Azure Blob Storage, and Google Cloud Storage (GCS). Whether you're running a lean startup or managing petabytes in an enterprise data lake, these techniques can realistically cut your storage bills by 60% or more. We'll cover automated tiering, lifecycle policies, orphaned resource cleanup, egress cost reduction, analytics-driven optimization, and the infrastructure-as-code patterns that make it all sustainable.
Understanding Cloud Storage Pricing: Where Your Money Actually Goes
The Four Pillars of Storage Costs
Before you can optimize anything, you need to understand the cost components that show up on your cloud storage bill. Across all three major providers, storage pricing breaks down into four categories:
- Storage capacity: The per-GB-per-month cost of keeping data at rest. This is the most visible cost and the one that scales linearly with data volume.
- Operations (API requests): Every PUT, GET, LIST, and DELETE operation incurs a charge. High-frequency operations on millions of small objects can generate surprisingly large bills — I've seen teams shocked by this one.
- Data retrieval: Pulling data out of cold or archive tiers incurs retrieval fees, which can actually dwarf storage costs if you misjudge access patterns.
- Data egress (transfer out): Moving data out of a cloud provider's network — to the internet, another region, or another cloud — comes with per-GB egress charges that typically range from $0.05 to $0.12 per GB.
Many teams focus exclusively on that first pillar and completely miss the other three. A well-optimized storage strategy addresses all four simultaneously.
Storage Tier Comparison Across Providers
Each provider offers a hierarchy of storage tiers optimized for different access frequencies. Here's how they compare for a typical US region:
AWS S3 Storage Classes:
- S3 Standard: ~$0.023/GB/month — frequent access, no retrieval fee
- S3 Intelligent-Tiering: ~$0.023/GB/month (auto-adjusts) — monitoring fee of $0.0025 per 1,000 objects
- S3 Standard-IA (Infrequent Access): ~$0.0125/GB/month — $0.01/GB retrieval fee
- S3 Glacier Instant Retrieval: ~$0.004/GB/month — millisecond retrieval, higher retrieval cost
- S3 Glacier Flexible Retrieval: ~$0.0036/GB/month — minutes to hours retrieval
- S3 Glacier Deep Archive: ~$0.00099/GB/month — 12–48 hour retrieval
Azure Blob Storage Tiers:
- Hot: ~$0.018/GB/month — frequent access
- Cool: ~$0.01/GB/month — 30-day minimum retention
- Cold: ~$0.0036/GB/month — 90-day minimum retention
- Archive: ~$0.00099/GB/month — 180-day minimum retention, hours to rehydrate
Google Cloud Storage Classes:
- Standard: ~$0.020/GB/month — frequent access
- Nearline: ~$0.010/GB/month — 30-day minimum retention
- Coldline: ~$0.004/GB/month — 90-day minimum retention
- Archive: ~$0.0012/GB/month — 365-day minimum retention
The math here is pretty stark. Moving 50 TB of infrequently accessed data from hot to cold storage can save $950 per month — over $11,000 per year — on storage capacity charges alone. Scale that to petabytes and the savings become genuinely transformative.
Strategy 1: Automated Storage Tiering
AWS S3 Intelligent-Tiering
S3 Intelligent-Tiering is arguably the single most impactful storage optimization feature available today. And the numbers back that up — since its launch, AWS reports that customers have saved over $6 billion in storage costs compared to keeping data in S3 Standard. The service automatically moves objects between access tiers based on observed access patterns, with no manual intervention needed.
Here's how the tiers work within Intelligent-Tiering:
- Frequent Access tier: Default tier for all new objects. Standard pricing applies.
- Infrequent Access tier: Objects not accessed for 30 consecutive days. 40% savings over Standard.
- Archive Instant Access tier: Objects not accessed for 90 consecutive days. 68% savings over Standard.
- Archive Access tier (opt-in): Objects not accessed for 90–180+ days. Up to 71% savings.
- Deep Archive Access tier (opt-in): Objects not accessed for 180+ days. Up to 95% savings.
The only additional cost is a monitoring fee of $0.0025 per 1,000 objects per month. Objects smaller than 128 KB are exempt from monitoring and always stay in the Frequent Access tier. And here's the real advantage: there are no retrieval charges within Intelligent-Tiering. That's a big deal compared to manually managed tiers where retrieval fees can sneak up on you fast.
To enable Intelligent-Tiering as the default for a bucket using the AWS CLI:
# Set default storage class for new objects via bucket policy
# Apply S3 Intelligent-Tiering to an existing bucket
aws s3api put-bucket-intelligent-tiering-configuration \
--bucket my-data-lake-bucket \
--id "FullOptimization" \
--intelligent-tiering-configuration '{
"Id": "FullOptimization",
"Status": "Enabled",
"Tierings": [
{
"AccessTier": "ARCHIVE_ACCESS",
"Days": 90
},
{
"AccessTier": "DEEP_ARCHIVE_ACCESS",
"Days": 180
}
]
}'
# Transition existing objects from Standard to Intelligent-Tiering
# using S3 Batch Operations or a lifecycle rule:
aws s3api put-bucket-lifecycle-configuration \
--bucket my-data-lake-bucket \
--lifecycle-configuration '{
"Rules": [
{
"ID": "TransitionToIntelligentTiering",
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"Transitions": [
{
"Days": 0,
"StorageClass": "INTELLIGENT_TIERING"
}
]
}
]
}'
When to use Intelligent-Tiering: It's ideal for data with unknown or changing access patterns — data lakes, log archives, user-generated content, and application backups. The monitoring fee becomes negligible at scale and is far outweighed by the automatic savings.
When to avoid it: If you know your data gets accessed consistently within 30 days (like hot cache layers), Intelligent-Tiering just adds monitoring costs without delivering savings. Same goes for very small objects below 128 KB — the monitoring fee could actually exceed the savings.
Google Cloud Storage Autoclass
GCS Autoclass takes a slightly different approach, using machine learning to automatically transition objects between Standard, Nearline, Coldline, and Archive classes based on observed access patterns. All new objects start in Standard, and Autoclass progressively moves them to colder tiers as access frequency declines.
Key characteristics of Autoclass:
- Management fee of $0.0025 per 1,000 objects stored for 30 days
- Objects smaller than 128 KiB always remain in Standard storage
- By default, the terminal tier is Nearline — but you can configure it to use Archive for maximum savings
- All operations are charged at the Standard storage rate, regardless of the object's current tier
- No retrieval fees beyond the one-time enablement charge
Enable Autoclass on a GCS bucket with the terminal tier set to Archive:
# Enable Autoclass with Archive as the terminal storage class
gcloud storage buckets update gs://my-data-bucket \
--enable-autoclass \
--autoclass-terminal-storage-class=ARCHIVE
# Verify Autoclass configuration
gcloud storage buckets describe gs://my-data-bucket \
--format="json(autoclass)"
Azure Blob Lifecycle Management
Azure doesn't have a direct equivalent to the ML-driven auto-tiering you get with AWS and GCP, but its lifecycle management policies provide powerful rule-based automation for transitioning data between Hot, Cool, Cold, and Archive tiers.
Here's a comprehensive lifecycle policy that implements a multi-stage tiering strategy:
{
"rules": [
{
"enabled": true,
"name": "AggressiveTieringPolicy",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": {
"daysAfterModificationGreaterThan": 30
},
"tierToCold": {
"daysAfterModificationGreaterThan": 90
},
"tierToArchive": {
"daysAfterModificationGreaterThan": 180
},
"delete": {
"daysAfterModificationGreaterThan": 2555
}
},
"snapshot": {
"tierToCool": {
"daysAfterCreationGreaterThan": 30
},
"delete": {
"daysAfterCreationGreaterThan": 365
}
},
"version": {
"tierToCool": {
"daysAfterCreationGreaterThan": 30
},
"delete": {
"daysAfterCreationGreaterThan": 90
}
}
},
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["data/", "logs/", "backups/"]
}
}
}
]
}
Apply this policy using the Azure CLI:
# Apply the lifecycle management policy to a storage account
az storage account management-policy create \
--account-name mystorageaccount \
--resource-group myresourcegroup \
--policy @lifecycle-policy.json
The Cold tier is particularly worth calling out. It's a relatively recent addition that fills the gap between Cool and Archive, delivering up to 64% savings compared to the Cool tier while still giving you immediate access to data. For compliance data, backups, and historical datasets that are rarely accessed but might need the occasional read, Cold is honestly the sweet spot.
Strategy 2: Lifecycle Policies for Data Hygiene
Expiring Incomplete Multipart Uploads
Incomplete multipart uploads are one of the most overlooked sources of storage waste — and I'd argue they're the easiest to fix. When large file uploads to S3 get interrupted or abandoned (application crashes, network failures, bugs), the uploaded parts persist indefinitely, consuming storage at full rates. Organizations with heavy upload workloads can accumulate hundreds of gigabytes of these orphaned parts without even knowing it.
# AWS CLI: Add lifecycle rule to abort incomplete multipart uploads after 7 days
aws s3api put-bucket-lifecycle-configuration \
--bucket my-upload-bucket \
--lifecycle-configuration '{
"Rules": [
{
"ID": "AbortIncompleteMultipartUploads",
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
}
]
}'
# Check for existing incomplete multipart uploads
aws s3api list-multipart-uploads --bucket my-upload-bucket
Managing Object Versions
Versioning is essential for data protection, but without lifecycle rules it leads to version stacking — the slow, silent accumulation of old versions that nobody will ever need. A single frequently-updated object could have hundreds of versions, each billed at the full per-GB rate. It adds up faster than you'd expect.
# Terraform: S3 lifecycle rule to manage versions and multipart uploads
resource "aws_s3_bucket_lifecycle_configuration" "data_hygiene" {
bucket = aws_s3_bucket.main.id
rule {
id = "expire-old-versions"
status = "Enabled"
noncurrent_version_expiration {
noncurrent_days = 30
}
noncurrent_version_transition {
noncurrent_days = 7
storage_class = "GLACIER_IR"
}
}
rule {
id = "abort-incomplete-uploads"
status = "Enabled"
abort_incomplete_multipart_upload {
days_after_initiation = 7
}
}
rule {
id = "expire-delete-markers"
status = "Enabled"
expiration {
expired_object_delete_marker = true
}
}
}
GCS Object Lifecycle Management
Google Cloud Storage lifecycle rules work on similar principles but with slightly different syntax. Here's a comprehensive policy covering tiering, version cleanup, and data expiration:
# lifecycle-config.json:
{
"rule": [
{
"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
"condition": {"age": 30, "matchesStorageClass": ["STANDARD"]}
},
{
"action": {"type": "SetStorageClass", "storageClass": "COLDLINE"},
"condition": {"age": 90, "matchesStorageClass": ["NEARLINE"]}
},
{
"action": {"type": "SetStorageClass", "storageClass": "ARCHIVE"},
"condition": {"age": 365, "matchesStorageClass": ["COLDLINE"]}
},
{
"action": {"type": "Delete"},
"condition": {"age": 2555}
},
{
"action": {"type": "Delete"},
"condition": {"isLive": false, "numNewerVersions": 3}
}
]
}
# Apply the lifecycle policy
gcloud storage buckets update gs://my-data-bucket \
--lifecycle-file=lifecycle-config.json
Strategy 3: Hunting and Eliminating Orphaned Resources
Orphaned EBS Snapshots and Volumes
When EC2 instances get terminated, their associated EBS volumes and snapshots often linger behind. EBS snapshots cost $0.05 per GB per month — a modest rate per snapshot, but one that accumulates aggressively across hundreds of forgotten snapshots. One real-world audit I came across found 87 unattached volumes totaling 4,300 GB, costing $344 per month and racking up $6,192 in waste over 18 months. That's real money just sitting there doing nothing.
Here's a script to identify and optionally clean up orphaned EBS resources:
#!/bin/bash
# find-orphaned-ebs.sh — Identify unattached EBS volumes and orphaned snapshots
echo "=== Unattached EBS Volumes ==="
aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query "Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime,Type:VolumeType}" \
--output table
echo ""
echo "=== Calculating waste from unattached volumes ==="
TOTAL_GB=$(aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query "sum(Volumes[*].Size)" \
--output text)
echo "Total unattached storage: ${TOTAL_GB} GB"
echo "Estimated monthly cost: \$$(echo "$TOTAL_GB * 0.08" | bc) (gp3 pricing)"
echo ""
echo "=== Orphaned Snapshots (volume no longer exists) ==="
# Get all snapshot volume IDs
SNAPSHOT_VOLS=$(aws ec2 describe-snapshots --owner-ids self \
--query "Snapshots[*].VolumeId" --output text | tr '\t' '\n' | sort -u)
# Get all existing volume IDs
EXISTING_VOLS=$(aws ec2 describe-volumes \
--query "Volumes[*].VolumeId" --output text | tr '\t' '\n' | sort -u)
# Find snapshots whose volumes no longer exist
for vol in $SNAPSHOT_VOLS; do
if ! echo "$EXISTING_VOLS" | grep -q "$vol"; then
aws ec2 describe-snapshots --owner-ids self \
--filters "Name=volume-id,Values=$vol" \
--query "Snapshots[*].{SnapshotId:SnapshotId,VolumeId:VolumeId,Size:VolumeSize,Started:StartTime}" \
--output table
fi
done
For automated ongoing cleanup, AWS Data Lifecycle Manager (DLM) handles snapshot retention policies declaratively:
# Terraform: Automated EBS snapshot lifecycle management
resource "aws_dlm_lifecycle_policy" "snapshot_cleanup" {
description = "Manage EBS snapshot lifecycle"
execution_role_arn = aws_iam_role.dlm_lifecycle.arn
state = "ENABLED"
policy_details {
resource_types = ["VOLUME"]
schedule {
name = "daily-snapshots"
create_rule {
interval = 24
interval_unit = "HOURS"
times = ["03:00"]
}
retain_rule {
count = 14 # Keep only 14 daily snapshots
}
tags_to_add = {
SnapshotCreator = "DLM"
AutoDelete = "true"
}
copy_tags = true
}
target_tags = {
Backup = "true"
}
}
}
Azure Orphaned Managed Disks
The exact same problem shows up in Azure. Managed disks persist after VM deletion and keep incurring charges quietly. Use the Azure CLI to find and clean them up:
# Find unattached managed disks in Azure
az disk list \
--query "[?managedBy==null].{Name:name,Size:diskSizeGb,SKU:sku.name,RG:resourceGroup}" \
--output table
# Calculate total waste
az disk list \
--query "[?managedBy==null].diskSizeGb" \
--output tsv | paste -sd+ | bc
Strategy 4: Using Storage Analytics for Data-Driven Decisions
AWS S3 Storage Lens
S3 Storage Lens provides organization-wide visibility into storage usage, activity trends, and cost optimization opportunities. The free tier includes 62 metrics with 14 days of historical data. The advanced tier (at $0.20 per million objects per month) adds 35+ additional metrics, 15-month history, and prefix-level aggregation — which is where the real insights come from.
Key optimization workflows with Storage Lens:
- Cold bucket detection: Use the bubble analysis feature to spot buckets with high storage but near-zero retrieval rates. These are prime candidates for tier transitions or archival.
- Incomplete multipart upload tracking: The "Incomplete multipart upload bytes greater than 7 days old" metric directly quantifies recoverable waste.
- Version accumulation: The "Noncurrent version bytes" metric reveals how much storage is being eaten up by old versions across your organization.
- Encryption compliance: Identify unencrypted objects that may represent compliance risk alongside cost optimization opportunities.
# Create an S3 Storage Lens dashboard with advanced metrics
aws s3control put-storage-lens-configuration \
--account-id 123456789012 \
--config-id "org-cost-optimization" \
--storage-lens-configuration '{
"Id": "org-cost-optimization",
"AccountLevel": {
"ActivityMetrics": {"IsEnabled": true},
"AdvancedCostOptimizationMetrics": {"IsEnabled": true},
"AdvancedDataProtectionMetrics": {"IsEnabled": true},
"DetailedStatusCodesMetrics": {"IsEnabled": true},
"BucketLevel": {
"ActivityMetrics": {"IsEnabled": true},
"AdvancedCostOptimizationMetrics": {"IsEnabled": true},
"PrefixLevel": {
"StorageMetrics": {
"IsEnabled": true,
"SelectionCriteria": {
"MaxDepth": 3,
"MinStorageBytesPercentage": 1.0
}
}
}
}
},
"IsEnabled": true,
"DataExport": {
"S3BucketDestination": {
"Format": "CSV",
"OutputSchemaVersion": "V_1",
"AccountId": "123456789012",
"Arn": "arn:aws:s3:::storage-lens-export-bucket",
"Prefix": "storage-lens/"
}
}
}'
Azure Storage Analytics and Cost Management
Azure provides storage metrics through Azure Monitor and Azure Cost Management. You can set up alerts for anomalous storage growth and tap into Azure Advisor recommendations for optimization:
# Query Azure Monitor for storage usage trends
az monitor metrics list \
--resource "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/{account}" \
--metric "UsedCapacity" \
--interval PT1H \
--start-time 2026-01-01T00:00:00Z \
--end-time 2026-02-01T00:00:00Z \
--output table
# Get Azure Advisor cost recommendations for storage
az advisor recommendation list \
--filter "Category eq 'Cost'" \
--query "[?contains(shortDescription.solution, 'storage') || contains(shortDescription.solution, 'disk')]" \
--output table
Strategy 5: Reducing Data Egress Costs
The Egress Cost Problem
Data egress is one of those costs that sneaks up on teams. It can account for 10–15% of total cloud costs, and the standard rate on AWS is $0.09 per GB for the first 10 TB transferred to the internet. That means moving 10 TB per month costs $900 — just for the privilege of accessing your own data. Cross-region transfers are cheaper ($0.01–$0.02/GB) but add up quickly for globally distributed architectures.
CDN Integration for Egress Reduction
Content Delivery Networks cache frequently accessed content at edge locations, reducing origin fetches and egress charges by 60–80%. The key is optimizing cache hit ratios through proper TTL configuration and origin shielding.
# Terraform: CloudFront distribution with S3 origin and optimized caching
resource "aws_cloudfront_distribution" "storage_cdn" {
origin {
domain_name = aws_s3_bucket.content.bucket_regional_domain_name
origin_id = "S3-content"
s3_origin_config {
origin_access_identity = aws_cloudfront_origin_access_identity.oai.cloudfront_access_identity_path
}
# Enable Origin Shield to reduce origin fetches
origin_shield {
enabled = true
origin_shield_region = "us-east-1"
}
}
enabled = true
is_ipv6_enabled = true
default_cache_behavior {
allowed_methods = ["GET", "HEAD"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-content"
forwarded_values {
query_string = false
cookies {
forward = "none"
}
}
viewer_protocol_policy = "redirect-to-https"
# Aggressive caching to maximize hit ratio
min_ttl = 0
default_ttl = 86400 # 24 hours
max_ttl = 31536000 # 1 year
compress = true # Gzip/Brotli compression reduces transfer volume
}
restrictions {
geo_restriction {
restriction_type = "none"
}
}
viewer_certificate {
cloudfront_default_certificate = true
}
}
Zero-Egress Storage Alternatives
For workloads with heavy outbound data transfer, it's worth considering zero-egress or low-egress storage providers as part of your architecture:
- Cloudflare R2: S3-compatible object storage with zero egress fees. Pricing is $0.015/GB/month for storage with no charges for data retrieval or transfer. It's ideal for serving static assets, hosting public datasets, or as a CDN origin.
- Backblaze B2: Offers free egress up to 3x your stored data volume per month. At $0.006/GB/month for storage, it's one of the cheapest options for archival workloads with moderate egress needs.
Now, these aren't replacements for S3 or Azure Blob in most architectures. But they can serve as cost-effective tiers for specific workloads — particularly content delivery, public datasets, and backup storage where egress costs would otherwise dominate the bill.
Data Compression Before Transfer
Compressing data before storage and transfer reduces both capacity costs and egress charges. A solid compression strategy can reduce data volume by 20–40%, with some data types (logs, JSON, CSV) achieving 80%+ compression ratios. Honestly, if you're not compressing log data before storing it, you're leaving money on the table.
# Python: Compress data before uploading to S3
import boto3
import gzip
import json
from io import BytesIO
s3_client = boto3.client("s3")
def upload_compressed(bucket, key, data):
"""Upload gzip-compressed data to S3 with appropriate metadata."""
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode="wb") as gz:
if isinstance(data, str):
gz.write(data.encode("utf-8"))
elif isinstance(data, dict) or isinstance(data, list):
gz.write(json.dumps(data).encode("utf-8"))
else:
gz.write(data)
buffer.seek(0)
s3_client.put_object(
Bucket=bucket,
Key=key,
Body=buffer.getvalue(),
ContentEncoding="gzip",
ContentType="application/json",
StorageClass="INTELLIGENT_TIERING",
)
original_size = len(json.dumps(data).encode("utf-8")) if isinstance(data, (dict, list)) else len(data)
compressed_size = buffer.tell()
ratio = (1 - compressed_size / original_size) * 100
print(f"Compressed {original_size:,} bytes to {compressed_size:,} bytes ({ratio:.1f}% reduction)")
# Example: Upload compressed log data
log_entries = [
{"timestamp": "2026-02-08T10:00:00Z", "level": "INFO", "message": f"Event {i}"}
for i in range(10000)
]
upload_compressed("my-logs-bucket", "logs/2026/02/08/events.json.gz", log_entries)
Strategy 6: Reserved Capacity and Committed Use Discounts
Azure Storage Reserved Capacity
Azure is unique among the major providers in offering explicit reserved capacity for storage. If your analysis shows consistent storage usage above a predictable threshold, reserved capacity can deliver substantial savings:
- Hot and Cool tiers: Up to 34% discount with a 3-year reservation
- Archive tier: Up to 17% discount with a 3-year reservation
- Available in units of 100 TB and 1 PB per month
- Covers the capacity component only — operations and retrieval are billed separately
To evaluate whether reserved capacity makes sense for your workload:
# Check current storage usage to determine reservation size
az storage account show \
--name mystorageaccount \
--resource-group myresourcegroup \
--query "{Name:name, Kind:kind, Tier:accessTier}"
# List all storage accounts and their usage for reservation planning
az storage account list \
--query "[].{Name:name,RG:resourceGroup,Kind:kind,Tier:accessTier}" \
--output table
# View reservation recommendations
az consumption reservation recommendation list \
--scope "Shared" \
--resource-type "Microsoft.Storage" \
--look-back-period "Last60Days"
The break-even point for a 1-year Azure storage reservation is typically around 7–8 months. If you're confident your data volume will remain stable or grow, the reservation pays for itself well before the term expires.
Strategy 7: Infrastructure as Code for Storage Governance
The most effective storage cost optimization isn't a one-time cleanup — it's a set of guardrails baked into your infrastructure provisioning process. Using Terraform, Pulumi, or CloudFormation, you can enforce cost-optimized defaults on every storage resource created across your organization.
# Terraform: Reusable module for cost-optimized S3 bucket
# modules/cost-optimized-bucket/main.tf
variable "bucket_name" {
type = string
}
variable "enable_versioning" {
type = bool
default = true
}
variable "noncurrent_version_days" {
type = number
default = 30
}
variable "archive_after_days" {
type = number
default = 90
}
variable "deep_archive_after_days" {
type = number
default = 180
}
resource "aws_s3_bucket" "this" {
bucket = var.bucket_name
}
resource "aws_s3_bucket_versioning" "this" {
bucket = aws_s3_bucket.this.id
versioning_configuration {
status = var.enable_versioning ? "Enabled" : "Suspended"
}
}
resource "aws_s3_bucket_intelligent_tiering_configuration" "this" {
bucket = aws_s3_bucket.this.id
name = "FullOptimization"
tiering {
access_tier = "ARCHIVE_ACCESS"
days = var.archive_after_days
}
tiering {
access_tier = "DEEP_ARCHIVE_ACCESS"
days = var.deep_archive_after_days
}
}
resource "aws_s3_bucket_lifecycle_configuration" "this" {
bucket = aws_s3_bucket.this.id
rule {
id = "transition-to-intelligent-tiering"
status = "Enabled"
transition {
days = 0
storage_class = "INTELLIGENT_TIERING"
}
}
rule {
id = "cleanup-old-versions"
status = "Enabled"
noncurrent_version_expiration {
noncurrent_days = var.noncurrent_version_days
}
}
rule {
id = "abort-incomplete-uploads"
status = "Enabled"
abort_incomplete_multipart_upload {
days_after_initiation = 7
}
}
rule {
id = "remove-expired-delete-markers"
status = "Enabled"
expiration {
expired_object_delete_marker = true
}
}
}
resource "aws_s3_bucket_public_access_block" "this" {
bucket = aws_s3_bucket.this.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
output "bucket_arn" {
value = aws_s3_bucket.this.arn
}
output "bucket_name" {
value = aws_s3_bucket.this.id
}
With this module, every new bucket in your organization gets Intelligent-Tiering, version cleanup, multipart upload abortion, and public access blocking by default. Teams can override the parameters for specific use cases, but the cost-optimized defaults mean no bucket ever gets created without basic hygiene rules in place. It's a small upfront investment that pays dividends forever.
Building a Storage Cost Optimization Roadmap
Quick Wins (Week 1–2)
Start with the optimizations that deliver immediate savings with minimal risk:
- Enable incomplete multipart upload cleanup on all buckets. This is pure waste elimination with zero downside.
- Delete orphaned EBS snapshots and unattached volumes. Run the audit script above and remove resources with no associated instances.
- Enable S3 Intelligent-Tiering on data lake and backup buckets. The transition cost is $0.01 per 1,000 objects — trivial compared to ongoing savings.
- Add noncurrent version expiration rules to all versioned buckets. Even a conservative 90-day policy eliminates significant version stacking waste.
Medium-Term Wins (Month 1–3)
- Deploy S3 Storage Lens with advanced metrics across your organization. Use the insights to identify cold buckets, high-egress patterns, and version accumulation hotspots.
- Implement CDN caching for frequently accessed storage content. Target a cache hit ratio above 90% to maximize egress savings.
- Evaluate Azure Storage Reserved Capacity if your Azure storage usage is predictable. Lock in those discounts for baseline capacity.
- Adopt the Terraform module pattern for all new storage resources. Retrofit existing buckets as part of a phased migration.
Long-Term Wins (Quarter 2+)
- Implement data classification and retention policies across your organization. Not all data needs to exist forever — establishing clear retention periods and automated deletion rules is the single most impactful long-term strategy.
- Evaluate zero-egress storage for appropriate workloads. Cloudflare R2 or Backblaze B2 can dramatically reduce costs for content delivery and public dataset hosting.
- Build a FinOps dashboard tracking storage unit economics: cost per GB stored, cost per GB retrieved, and cost per GB transferred. Review these metrics monthly in cross-functional FinOps reviews.
- Implement data compression pipelines for log, analytics, and archival data. The one-time engineering investment pays ongoing dividends in both storage and egress savings.
Measuring Success: KPIs for Storage Cost Optimization
Track these metrics to measure the impact of your optimization efforts:
- Storage cost per GB: Your effective blended rate across all tiers. Target a 40–60% reduction from your pre-optimization baseline.
- Percentage of data in hot tier: Aim for less than 30% of total data volume in the most expensive tier. If you're above 50%, your tiering automation isn't aggressive enough.
- Orphaned resource count: Track unattached volumes, orphaned snapshots, and incomplete uploads. Target zero through automated lifecycle policies.
- Egress-to-storage ratio: Monitor data transfer costs relative to storage costs. A ratio above 0.3 suggests opportunities for CDN caching or architecture optimization.
- Cache hit ratio: For CDN-fronted storage, target 90%+ hit ratios to maximize egress savings.
- Storage growth rate vs. data growth rate: If storage costs are growing faster than data volume, your tiering and lifecycle policies aren't keeping pace.
Conclusion: Storage Optimization Is a Continuous Practice
Cloud storage cost optimization isn't a one-and-done project. Data grows continuously, access patterns shift, new storage tiers and features launch, and pricing changes. The organizations that sustain 60%+ savings treat storage optimization as an ongoing practice embedded in their FinOps operations — not a quarterly cleanup sprint.
The strategies in this guide — automated tiering, lifecycle policies, orphaned resource cleanup, egress reduction, analytics-driven decision-making, and infrastructure-as-code guardrails — form a comprehensive framework that works across AWS, Azure, and GCP. Start with the quick wins to demonstrate value, build organizational momentum with medium-term projects, and invest in long-term automation that prevents waste from accumulating in the first place.
Here's a final thought to drive this home. The $0.023/GB/month you're paying for hot storage may seem trivial for a few terabytes. But at 100 TB, that's $2,300 per month. At a petabyte, it's $23,000. Move 80% of that data to appropriate cold tiers and you're looking at savings of $15,000–$20,000 per month — every month, compounding as your data grows.
That's the power of systematic storage cost optimization. So, what are you waiting for?