Last quarter, a well-meaning engineer on our platform team merged a Terraform pull request that provisioned a cluster of p4d.24xlarge GPU instances for a dev experiment. The monthly bill impact? Over $94,000. Nobody caught it in code review because nobody was looking at cost — they were reviewing architecture, security, and correctness. The infrastructure was technically sound. It was just absurdly expensive for a development workload.
This is exactly the kind of problem that FinOps as Code solves. Instead of discovering cost overruns after the bill arrives, you embed cost policies directly into your infrastructure-as-code pipeline. Every Terraform pull request gets a cost estimate, and automated policies block changes that exceed budget thresholds — before a single resource is provisioned.
In this guide, I'll walk you through building a complete FinOps-as-Code pipeline using three open-source tools: Infracost for cost estimation, Open Policy Agent (OPA) for policy enforcement, and Conftest for testing policies against Terraform plans. By the end, you'll have working code that catches cost overruns in pull requests, enforces instance-size restrictions per environment, and blocks deployments that violate tagging requirements — all automatically. So, let's get into it.
What Is FinOps as Code and Why It Matters in 2026
FinOps as Code (FaC) applies the same principles that transformed infrastructure management — version control, automation, peer review, and continuous enforcement — to cloud financial management. Rather than relying on monthly billing reviews or manual cost audits, FaC embeds cost awareness directly into the software delivery lifecycle.
The 2026 State of FinOps report highlights why this matters now more than ever. Cloud spending has crossed the $1 trillion mark globally, yet organizations consistently waste 30–35% of their cloud budget on overprovisioned or idle resources. At the same time, 78% of FinOps practices now report into the CTO/CIO organization rather than finance — signaling that cost management has genuinely become an engineering discipline, not just a finance problem.
McKinsey estimates the potential value from FinOps as Code at approximately $120 billion — based on the roughly 28% of cloud spending that organizations report as waste. The organizations capturing that value? They're the ones treating cost policies like code: versioned, tested, and enforced in CI/CD.
The Shift-Left Cost Management Model
Traditional cloud cost management is reactive. You provision resources, wait for the monthly bill, analyze the damage, and then try to optimize. This cycle creates a fundamental problem: by the time you spot waste, you've already paid for it.
FinOps as Code flips this model by shifting cost decisions left — into the development workflow where engineers actually make infrastructure choices. Here's how the two models compare:
| Aspect | Traditional FinOps | FinOps as Code |
|---|---|---|
| When costs are reviewed | After deployment (monthly) | Before deployment (every PR) |
| Who reviews costs | Finance or FinOps team | Engineers during code review |
| How policies are enforced | Manual audits and tickets | Automated CI/CD gates |
| Time to detect overspend | Days to weeks | Minutes (at PR time) |
| Cost policy format | Wiki pages, spreadsheets | Version-controlled Rego files |
| Remediation approach | Reactive cleanup | Preventive blocking |
The FinOps-as-Code Toolchain
A complete FinOps-as-Code pipeline consists of three components that work together: cost estimation, policy definition, and enforcement. Here's how each tool fits into the pipeline.
Infracost: Cost Estimation from Terraform Plans
Infracost is an open-source tool that parses Terraform plan files and calculates the estimated monthly cost of provisioned infrastructure. It supports over 1,100 resource types across AWS, Azure, and Google Cloud. Importantly, it sends no cloud credentials or secrets to any external service — it only needs pricing data from the Infracost API (free for individual use).
Infracost produces a detailed JSON breakdown of current costs, projected costs after the change, and the cost difference — making it straightforward to write policies against specific cost thresholds.
Open Policy Agent (OPA): Policy Definition in Rego
OPA is a general-purpose policy engine that evaluates structured data (JSON) against policies written in the Rego language. In our pipeline, OPA evaluates the Infracost JSON output and the Terraform plan JSON against cost, governance, and tagging policies.
Rego is a declarative language specifically designed for policy evaluation. It can express complex rules concisely — from simple cost thresholds to conditional restrictions like "GPU instances are allowed in production but denied in dev environments." Honestly, once you get comfortable with Rego's syntax, writing these rules becomes surprisingly intuitive.
Conftest: Testing Policies in CI/CD
Conftest is a utility built on top of OPA that makes it easy to test structured data files against Rego policies in CI/CD pipelines. Rather than writing custom OPA evaluation scripts, you simply point Conftest at your data file and policy directory. It returns pass/fail results with clear error messages.
Step 1: Install and Configure Infracost
Start by installing Infracost on your local machine or CI runner. The tool is available via Homebrew, Chocolatey, Docker, or a direct binary download.
# macOS / Linux
brew install infracost
# Or download the binary directly
curl -fsSL https://raw.githubusercontent.com/infracost/infracost/master/scripts/install.sh | sh
# Register for a free API key (no credit card required)
infracost auth login
# Verify the installation
infracost --version
Once installed, generate a cost breakdown from an existing Terraform project:
# Navigate to your Terraform project directory
cd /path/to/terraform/project
# Generate a Terraform plan
terraform init
terraform plan -out=tfplan.binary
# Convert the plan to JSON (needed for OPA policies later)
terraform show -json tfplan.binary > plan.json
# Generate the Infracost cost breakdown
infracost breakdown --path . --format json --out-file infracost.json
# View a human-readable summary
infracost breakdown --path .
The human-readable output looks something like this:
Project: my-terraform-project
Name Monthly Qty Unit Monthly Cost
aws_instance.web_server
├─ Instance usage (Linux/UNIX, on-demand, m5.xlarge)
│ 730 hours $140.16
├─ root_block_device
│ └─ Storage (general purpose SSD, gp3) 50 GB $4.00
└─ ebs_block_device[0]
└─ Storage (general purpose SSD, gp3) 200 GB $16.00
aws_db_instance.primary
├─ Database instance (on-demand, db.r5.large) 730 hours $175.20
└─ Storage (general purpose SSD, gp2) 100 GB $11.50
OVERALL TOTAL $346.86/mo
The JSON output (infracost.json) contains fields like totalMonthlyCost, diffTotalMonthlyCost, and per-resource breakdowns — these are the fields you'll write OPA policies against.
Step 2: Write Cost Policies with OPA Rego
Now comes the fun part. You'll define cost guardrails as Rego policies that evaluate the Infracost JSON output. Create a policies/ directory in your repository to store these policies alongside your Terraform code.
Policy 1: Monthly Cost Diff Threshold
This policy prevents any single pull request from increasing monthly costs by more than a defined threshold. It's the most common starting point for FinOps as Code — and for good reason, since it catches the biggest mistakes immediately.
# policies/cost_threshold.rego
package infracost
deny[out] {
# Set the maximum allowed monthly cost increase per PR
maxDiff := 5000.0
msg := sprintf(
"Monthly cost increase of $%.2f exceeds the $%.2f threshold. Requires FinOps team approval.",
[to_number(input.diffTotalMonthlyCost), maxDiff]
)
out := {
"msg": msg,
"failed": to_number(input.diffTotalMonthlyCost) >= maxDiff
}
}
Policy 2: Block Expensive Instance Types in Non-Production
This one's particularly useful. It restricts GPU and high-memory instance types to production environments only, evaluating the Terraform plan JSON rather than the Infracost output.
# policies/instance_restrictions.rego
package terraform.cost_control
# Define expensive instance families that require production justification
expensive_prefixes := [
"p4d", "p4de", "p5", "p3", # GPU instances
"x2idn", "x2iedn", "x1", "u-", # High-memory instances
"dl1", "dl2q", "trn1", "inf2" # ML training/inference
]
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
resource.change.actions[_] == "create"
instance_type := resource.change.after.instance_type
prefix := expensive_prefixes[_]
startswith(instance_type, prefix)
# Check if environment tag is NOT production
tags := object.get(resource.change.after, "tags", {})
env := object.get(tags, "Environment", "unknown")
env != "production"
env != "prod"
msg := sprintf(
"Instance '%s' uses expensive type '%s' in '%s' environment. GPU/high-memory instances are only allowed in production. Resource: %s",
[resource.address, instance_type, env, resource.address]
)
}
Policy 3: Enforce Mandatory Cost Allocation Tags
Without proper tagging, cost attribution is basically impossible. This policy ensures every taggable resource includes the mandatory FinOps tags before deployment.
# policies/mandatory_tags.rego
package terraform.tagging
# Define the tags that every resource MUST have
required_tags := ["Environment", "Team", "CostCenter", "Project"]
# Resources that support tags
taggable_types := [
"aws_instance", "aws_db_instance", "aws_s3_bucket",
"aws_lambda_function", "aws_ecs_service", "aws_eks_cluster",
"aws_rds_cluster", "aws_elasticache_cluster", "aws_sqs_queue"
]
deny[msg] {
resource := input.resource_changes[_]
resource.change.actions[_] == "create"
# Check if this is a taggable resource type
resource.type == taggable_types[_]
tags := object.get(resource.change.after, "tags", {})
required_tag := required_tags[_]
not tags[required_tag]
msg := sprintf(
"Resource '%s' (%s) is missing required tag '%s'. All resources must include: %s",
[resource.address, resource.type, required_tag, concat(", ", required_tags)]
)
}
Policy 4: Per-Resource Hourly Cost Limit
Sometimes you want to catch individual resources that are disproportionately expensive — even if the total diff is under your threshold. This Rego policy flags any single resource costing more than $2/hour.
# policies/per_resource_limit.rego
package infracost
deny[out] {
# Maximum allowed hourly cost for any single resource
maxHourlyCost := 2.0
resource := input.projects[_].breakdown.resources[_]
hourly := to_number(resource.monthlyCost) / 730
hourly > maxHourlyCost
msg := sprintf(
"Resource '%s' costs $%.2f/hour ($%.2f/month), exceeding the $%.2f/hour limit.",
[resource.name, hourly, to_number(resource.monthlyCost), maxHourlyCost]
)
out := {
"msg": msg,
"failed": true
}
}
Step 3: Test Policies Locally with Conftest
Before wiring policies into CI/CD, validate them locally with Conftest. This step saves a lot of back-and-forth debugging once you're in the pipeline. Conftest runs Rego policies against data files and returns clear pass/fail results.
# Install Conftest
brew install conftest
# Test Infracost output against cost policies
conftest test infracost.json --policy ./policies -n infracost
# Test Terraform plan against governance policies
conftest test plan.json --policy ./policies -n terraform
# Run all policies and output results as JSON
conftest test infracost.json plan.json --policy ./policies --output json
When a policy violation is detected, Conftest produces output like this:
FAIL - infracost.json - infracost - Monthly cost increase of $8420.00 exceeds
the $5000.00 threshold. Requires FinOps team approval.
FAIL - plan.json - terraform.cost_control - Instance 'module.ml_experiment.aws_instance.gpu'
uses expensive type 'p4d.24xlarge' in 'dev' environment. GPU/high-memory instances
are only allowed in production.
2 tests, 0 passed, 0 warnings, 2 failures
Conftest returns a non-zero exit code on failure, which naturally integrates with CI/CD pipeline gating. Clean and simple.
Step 4: Build the GitHub Actions Pipeline
Now let's wire everything together in a GitHub Actions workflow that runs on every pull request. This workflow generates a Terraform plan, estimates costs with Infracost, evaluates OPA policies, and posts results as a PR comment.
# .github/workflows/finops-cost-gate.yml
name: FinOps Cost Gate
on:
pull_request:
paths:
- "terraform/**"
- "policies/**"
permissions:
contents: read
pull-requests: write
env:
TF_ROOT: terraform
POLICY_DIR: policies
jobs:
cost-policy-check:
name: Cost Estimation & Policy Enforcement
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.9.x
terraform_wrapper: false
- name: Setup Infracost
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Install Conftest
run: |
LATEST=$(curl -s https://api.github.com/repos/open-policy-agent/conftest/releases/latest | grep tag_name | cut -d '"' -f4 | sed 's/v//')
wget -q "https://github.com/open-policy-agent/conftest/releases/download/v${LATEST}/conftest_${LATEST}_Linux_x86_64.tar.gz"
tar xzf conftest_${LATEST}_Linux_x86_64.tar.gz
sudo mv conftest /usr/local/bin/
- name: Terraform Init & Plan
working-directory: ${{ env.TF_ROOT }}
run: |
terraform init -input=false
terraform plan -out=tfplan.binary -input=false
terraform show -json tfplan.binary > plan.json
- name: Generate Infracost Breakdown
working-directory: ${{ env.TF_ROOT }}
run: |
infracost breakdown \
--path . \
--format json \
--out-file infracost.json
- name: Evaluate Cost Policies
id: policy-check
working-directory: ${{ env.TF_ROOT }}
run: |
echo "## Policy Results" > policy-results.md
# Run Infracost cost policies
echo "### Cost Threshold Policies" >> policy-results.md
if conftest test infracost.json --policy ../${{ env.POLICY_DIR }} -n infracost 2>&1 | tee -a policy-results.md; then
echo "cost_passed=true" >> $GITHUB_OUTPUT
else
echo "cost_passed=false" >> $GITHUB_OUTPUT
fi
# Run Terraform governance policies
echo "### Governance Policies" >> policy-results.md
if conftest test plan.json --policy ../${{ env.POLICY_DIR }} -n terraform 2>&1 | tee -a policy-results.md; then
echo "governance_passed=true" >> $GITHUB_OUTPUT
else
echo "governance_passed=false" >> $GITHUB_OUTPUT
fi
- name: Post Infracost Comment
uses: infracost/actions/comment@v3
with:
path: ${{ env.TF_ROOT }}/infracost.json
behavior: update
- name: Post Policy Results Comment
uses: marocchino/sticky-pull-request-comment@v2
with:
header: finops-policy
path: ${{ env.TF_ROOT }}/policy-results.md
- name: Fail on Policy Violation
if: steps.policy-check.outputs.cost_passed == 'false' || steps.policy-check.outputs.governance_passed == 'false'
run: |
echo "::error::FinOps policy check failed. Review the PR comments for details."
exit 1
When a pull request triggers this workflow, engineers see two comments on their PR: one from Infracost showing the cost breakdown and diff, and another showing policy evaluation results. If any policy fails, the PR check fails and the merge is blocked until the violation is resolved.
Step 5: Add Azure and GCP Policy Support
The same pattern works across all three major cloud providers. Infracost natively supports AWS, Azure, and GCP Terraform resources — you just need provider-specific Rego policies for governance rules.
Azure VM Size Restrictions
# policies/azure_vm_restrictions.rego
package terraform.azure_cost_control
# Block expensive VM sizes in non-production
expensive_azure_skus := [
"Standard_NC", "Standard_ND", "Standard_NV", # GPU series
"Standard_M", "Standard_L", # Memory/Storage optimized
"Standard_HB", "Standard_HC" # HPC series
]
deny[msg] {
resource := input.resource_changes[_]
resource.type == "azurerm_linux_virtual_machine"
resource.change.actions[_] == "create"
vm_size := resource.change.after.size
prefix := expensive_azure_skus[_]
startswith(vm_size, prefix)
tags := object.get(resource.change.after, "tags", {})
env := object.get(tags, "environment", "unknown")
env != "production"
msg := sprintf(
"Azure VM '%s' uses expensive SKU '%s' in '%s' environment. GPU/HPC/memory-optimized VMs require production justification.",
[resource.address, vm_size, env]
)
}
GCP Machine Type Restrictions
# policies/gcp_machine_restrictions.rego
package terraform.gcp_cost_control
# Block accelerator-optimized machines in non-production
deny[msg] {
resource := input.resource_changes[_]
resource.type == "google_compute_instance"
resource.change.actions[_] == "create"
# Check for attached GPUs
guest_accelerators := object.get(resource.change.after, "guest_accelerator", [])
count(guest_accelerators) > 0
labels := object.get(resource.change.after, "labels", {})
env := object.get(labels, "environment", "unknown")
env != "production"
msg := sprintf(
"GCP instance '%s' has GPU accelerators attached in '%s' environment. GPU instances are only allowed in production.",
[resource.address, env]
)
}
Step 6: Implement the Crawl-Walk-Run Maturity Model
Rolling out FinOps as Code to an organization requires a phased approach. Jumping straight to hard-blocking policies will generate pushback from engineering teams — and honestly, that pushback is fair if teams don't yet trust the cost estimates. Instead, follow the crawl-walk-run maturity model.
Phase 1: Crawl — Visibility Only (Weeks 1–4)
Start by deploying Infracost in your CI/CD pipeline with no blocking policies. Engineers see cost estimates on every PR but are never blocked. This builds awareness and trust in the tooling.
- Install Infracost in your CI/CD pipeline
- Post cost breakdown comments on every Terraform PR
- Set up a shared dashboard to track weekly cost diffs across teams
- Gather feedback from engineers on accuracy and usefulness
Phase 2: Walk — Advisory Policies (Weeks 5–8)
Add OPA policies but configure them as warnings only. Policy violations show up in PR comments but don't block merges. This lets you iterate on policy thresholds based on real-world data before they become hard gates.
# policies/advisory_cost.rego
package infracost
# Advisory policy — warns but does not block
warn[out] {
maxDiff := 2000.0
msg := sprintf(
"Advisory: This PR increases monthly costs by $%.2f. Consider reviewing resource sizing.",
[to_number(input.diffTotalMonthlyCost)]
)
out := {
"msg": msg,
"failed": to_number(input.diffTotalMonthlyCost) >= maxDiff
}
}
Phase 3: Run — Enforced Policies (Week 9+)
Once policy thresholds are calibrated and teams are familiar with the cost feedback loop, convert advisory policies to hard-deny policies. At this stage, PR merges are blocked when policies fail, and exceptions require explicit approval from the FinOps team lead.
- Convert
warnrules todenyrules - Set up an exception workflow (e.g., a label like
finops-approvedthat bypasses the gate) - Track policy violation frequency and adjust thresholds quarterly
- Expand policies to cover new resource types as they enter your Terraform codebase
Repository Structure for FinOps as Code
Organizing your policies alongside infrastructure code makes them easy to discover, review, and maintain. Here's a recommended repository structure:
infrastructure-repo/
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── providers.tf
├── policies/
│ ├── cost_threshold.rego # Infracost cost diff limits
│ ├── per_resource_limit.rego # Per-resource hourly cost caps
│ ├── instance_restrictions.rego # Environment-based instance controls
│ ├── azure_vm_restrictions.rego # Azure-specific VM policies
│ ├── gcp_machine_restrictions.rego # GCP-specific policies
│ ├── mandatory_tags.rego # Tagging enforcement
│ └── advisory_cost.rego # Warning-only policies
├── .github/
│ └── workflows/
│ └── finops-cost-gate.yml # GitHub Actions pipeline
└── infracost-usage.yml # Usage estimates for consumption-based resources
Storing policies in Git alongside your Terraform code means they go through the same code review process. When an engineer proposes a policy change — like raising the cost threshold from $5,000 to $10,000 — the PR conversation creates an auditable record of why the change was made and who approved it. That audit trail matters more than most people realize, especially when finance or compliance teams start asking questions down the line.
Advanced: Usage-Based Cost Estimation
Some cloud resources have costs driven by consumption rather than provisioning — think Lambda invocations, S3 requests, or data transfer. Infracost supports a usage.yml file that lets you model expected consumption patterns for more accurate estimates.
# infracost-usage.yml
version: 0.1
resource_usage:
aws_lambda_function.api_handler:
monthly_requests: 10000000 # 10M requests/month
request_duration_ms: 250 # Average 250ms per invocation
aws_s3_bucket.data_lake:
standard:
storage_gb: 5000 # 5TB stored
monthly_tier_1_requests: 1000000 # PUT/COPY/POST/LIST
monthly_tier_2_requests: 5000000 # GET/SELECT
aws_nat_gateway.main:
monthly_data_processed_gb: 500 # 500GB NAT traffic
aws_cloudwatch_log_group.app:
monthly_data_ingested_gb: 100 # 100GB logs ingested
storage_gb: 200 # 200GB retained
Reference this file when running Infracost:
infracost breakdown --path . --usage-file infracost-usage.yml --format json --out-file infracost.json
This makes your cost estimates significantly more accurate for serverless and consumption-based architectures, catching hidden costs that purely provisioning-based estimates would miss entirely.
Measuring Success: Key Metrics to Track
Deploying FinOps as Code isn't a one-time project — it's an ongoing practice. Track these metrics to measure the impact and keep iterating on your policies:
- Policy violation rate: What percentage of PRs trigger policy failures? If it's above 30%, your thresholds may be too aggressive. If it's near zero, they're probably too loose.
- Cost avoidance: The cumulative monthly cost of resources that were blocked or resized before deployment. This is your most concrete ROI metric.
- Time to remediation: How long does it take engineers to fix a policy violation after it's flagged? This should decrease as teams internalize cost awareness.
- Tag coverage: What percentage of deployed resources have all mandatory tags? Target 95%+ within three months of enforcing tagging policies.
- False positive rate: How often do legitimate deployments get blocked? High false positives erode trust in the system fast. Refine policies when this exceeds 10%.
Frequently Asked Questions
What's the difference between FinOps as Code and traditional FinOps?
Traditional FinOps relies on dashboards, monthly reviews, and manual cost optimization efforts after resources are deployed. FinOps as Code automates these practices by embedding cost policies directly into your CI/CD pipeline using tools like Terraform, OPA, and Infracost. Instead of catching cost overruns after the bill arrives, FinOps as Code blocks them before any resources are provisioned — shifting cost management from reactive cleanup to proactive prevention.
Does Infracost send my cloud credentials or Terraform state to external servers?
No. Infracost parses your Terraform project locally to determine resource types and quantities. It only sends resource type and pricing lookup queries to the Infracost API — no cloud credentials, secrets, or Terraform state data leaves your environment. The API key is free and is used solely for fetching real-time cloud pricing information from AWS, Azure, and GCP.
Can I use FinOps as Code with tools other than Terraform?
Yes, though the ecosystem is most mature for Terraform. Infracost has added support for OpenTofu (the open-source Terraform fork) and is building support for Pulumi, AWS CloudFormation/CDK, and Azure ARM/Bicep. OPA and Conftest work with any JSON or YAML input, so you can write Rego policies against CloudFormation templates or Kubernetes manifests just as easily as Terraform plans.
How do I handle exceptions when a legitimate deployment gets blocked?
Build an exception workflow into your pipeline. A common approach is to require a specific PR label (like finops-approved) that bypasses the policy gate. Only designated approvers — typically FinOps leads or engineering managers — should have permission to apply this label. The exception gets recorded in the PR history, creating an audit trail. You can also use OPA's "soft mandatory" enforcement level, which flags violations but allows authorized overrides.
What's a good starting threshold for cost diff policies?
Start with an advisory-only policy at a threshold slightly above your typical PR cost impact. Analyze two to four weeks of Infracost data to understand your baseline — for most teams, the median PR increases monthly costs by $50–$500. Set your initial warning threshold at the 90th percentile of historical cost diffs (often $2,000–$5,000), and your hard-deny threshold at a level that would clearly indicate a provisioning mistake or unapproved architecture change (often $5,000–$15,000). Adjust quarterly based on violation rates and false positive feedback.