Kubecost Setup Guide: Kubernetes Cost Visibility and Optimization in 2026

Step-by-step Kubecost 2.x install on EKS, AKS, and GKE: Helm chart, cloud billing reconciliation, namespace allocation, network egress, alerts, multi-cluster federation, and Kubecost vs OpenCost.

Kubecost Setup Guide 2026

Updated: May 25, 2026

Kubecost is an open-core Kubernetes cost monitoring tool that joins Prometheus metrics with cloud billing data (AWS CUR, Azure Cost Management, GCP Billing Export) to allocate spend to namespaces, deployments, labels, and pods in near real time. To set up Kubecost in 2026, install the official Helm chart, connect your cloud provider's billing export, configure persistent storage for the Prometheus and cost-model components, and optionally join multiple clusters through the Kubecost Federated ETL bucket. This guide walks through every step on EKS, AKS, and GKE, plus the production hardening most quickstart docs gloss over.

  • Kubecost 2.x runs as a Helm release; the free tier covers one cluster and 15 days of metrics retention.
  • Accurate cost requires connecting cloud billing (AWS CUR via Athena, Azure CM exports, or GCP BigQuery billing).
  • OpenCost is the CNCF-incubating upstream engine; Kubecost Enterprise adds multi-cluster, SSO, RBAC, and unlimited retention.
  • Network egress allocation is off by default. Enabling it uses a DaemonSet that conntracks pod-to-region traffic.
  • Expect 250–500 MiB of Prometheus storage per node per day; right-size scrape intervals to control overhead.
  • Pair Kubecost with Karpenter and a VPA to turn visibility into actual savings of 40–70% on a typical cluster.

What is Kubecost?

Kubecost is a monitoring platform that answers the question "what does this Kubernetes workload actually cost?" It scrapes kube-state-metrics and the kubelet for resource usage, queries the cloud provider's billing API for the price of each node, persistent volume, and load balancer, and then divides the bill across namespaces, deployments, pods, labels, and annotations. The result is a UI and Prometheus-compatible API that show cost per namespace per hour, idle waste, and right-sizing recommendations.

Kubecost ships in three editions in 2026. The Free tier covers a single cluster with 15 days of metric retention and no SSO. Kubecost Enterprise adds unlimited retention, multi-cluster federation, RBAC, SAML/OIDC, and a hosted control plane. OpenCost (the CNCF Sandbox project that became Kubecost's open-source engine) is fully free, has no UI of its own, and exposes a Prometheus exporter you can wire into Grafana. Most teams start with the free Kubecost UI on one cluster, then graduate to Enterprise or self-hosted OpenCost plus Grafana as fleet size grows.

Under the hood, Kubecost is a Go binary called the cost-model, a vendored Prometheus, a Grafana dashboard, and optionally a Network Costs DaemonSet. The cost-model joins Prometheus time series with on-disk pricing data refreshed nightly from each cloud's public price list, then overlays your actual invoice once cloud billing is connected.

Kubecost vs OpenCost: what is the difference?

OpenCost is the upstream, vendor-neutral cost allocation engine donated by Kubecost to the CNCF in 2022, and currently in Incubation. Kubecost is the commercial product that bundles OpenCost with a UI, multi-cluster ETL, savings recommendations, governance features, and support. If you want raw cost-per-namespace metrics into Grafana and nothing else, OpenCost alone is enough. If you want a turnkey UI, alerts, RI/SP coverage analytics, and audit logs, install Kubecost.

CapabilityOpenCost (free)Kubecost FreeKubecost Enterprise
Cost allocation engineYesYes (same engine)Yes
Web UI / dashboardsNo (Grafana only)YesYes
Metric retentionUnlimited (your Prom)15 daysUnlimited
Multi-cluster federationManualNoYes
RBAC + SSO (SAML/OIDC)NoNoYes
Savings recommendationsBasicYesYes (RI/SP coverage)
Cloud billing reconciliationYesYesYes
SupportCommunityCommunityEnterprise SLA

For more on the broader allocation problem, see our companion piece on Kubernetes chargeback and showback pipelines, which covers exporting the same data to a finance warehouse.

Prerequisites for installation

Before you run helm install, confirm a few things. A managed Kubernetes 1.27+ cluster (EKS, AKS, or GKE) is the simplest target. Kubecost 2.x dropped 1.24 support in early 2026, so anything older needs an upgrade first. You'll need cluster-admin rights to install CRDs and a StorageClass that supports ReadWriteOnce with at least 32 GiB available; the bundled Prometheus is the largest disk consumer.

If you already run Prometheus, you can point Kubecost at it instead of running a second copy. Set prometheus.enabled=false and global.prometheus.fqdn to your existing endpoint. Honestly, this is what I'd recommend on any cluster that already has an observability stack; doubling up Prometheuses just doubles your scrape load.

You also need a Kubecost product token from the Kubecost website (free tier is fine, no card required). The token unlocks the UI but does not phone home with metric data. Finally, decide on cloud billing access up front. Without it, Kubecost falls back to on-demand public list prices and will overstate cost for any node covered by a Savings Plan, Reserved Instance, Spot pool, or Committed Use Discount. The fix is connecting your billing export, which is covered in the next section.

How do I install Kubecost on Kubernetes?

The official path is the cost-analyzer Helm chart. The following commands install Kubecost 2.5 into a dedicated namespace with persistence enabled and a 30 GiB Prometheus volume. Replace YOUR_KUBECOST_TOKEN with the token from your free signup.

# Add the repo (verified 2026)
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

# Create namespace
kubectl create namespace kubecost

# Install with sane production defaults
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --version 2.5.* \
  --set kubecostToken="YOUR_KUBECOST_TOKEN" \
  --set global.prometheus.enabled=true \
  --set persistentVolume.enabled=true \
  --set persistentVolume.size=32Gi \
  --set prometheus.server.persistentVolume.size=64Gi \
  --set prometheus.server.retention=15d \
  --set networkCosts.enabled=false \
  --wait

Wait two to three minutes for the cost-model to populate its first metrics, then port-forward the UI:

kubectl port-forward --namespace kubecost \
  deployment/kubecost-cost-analyzer 9090:9090

# Open http://localhost:9090

For a long-lived install, replace the port-forward with an Ingress and gate it behind your identity provider. The free tier has no built-in auth, so exposing it directly to the internet is a credentials leak waiting to happen. A typical pattern is an ALB Ingress in EKS with OIDC authentication via Cognito, or an oauth2-proxy sidecar.

Connect cloud billing data (AWS, Azure, GCP)

Out of the box, Kubecost prices each node at on-demand list price. To get accurate numbers, point it at the cloud provider's billing export. The setup differs per cloud.

AWS: Cost and Usage Report via Athena

Enable a daily Cost and Usage Report (CUR) in S3, register an Athena database against it, and grant the Kubecost service account access through IRSA. The minimum IAM policy needs athena:StartQueryExecution, s3:GetObject on the CUR bucket, and glue:GetTable. Set kubecostProductConfigs.athenaProjectID, athenaBucketName, athenaRegion, athenaDatabase, and athenaTable in your Helm values. Once reconciliation runs (every 24 hours by default), Kubecost will retroactively replace list-price estimates with actual invoiced amounts, including Savings Plan and Reserved Instance discounts.

Azure: Cost Management exports to a Storage Account

Create a daily Cost Management export of "Actual cost" to a Blob container, then configure Kubecost with azureSubscriptionID, azureStorageAccount, azureStorageAccessKey, and the container name. Azure exports are CSV; Kubecost ingests them on a 6-hour cadence.

GCP: BigQuery billing export

Enable detailed billing export to BigQuery, create a service account with bigquery.dataViewer and bigquery.jobUser, then set gcpServiceAccountKeyJson and the BigQuery table reference. For Workload Identity on GKE, mount the service account via annotation instead of embedding the key in a Secret.

Until reconciliation completes, treat the UI's numbers as directional. If your fleet is mostly on commitments, list-price-only Kubecost can overstate cost by 30–60%. I learned this the hard way on a 3-year SP-heavy EKS fleet; the first week of Kubecost numbers looked like a five-alarm fire until CUR ingestion caught up. The same principle applies to non-Kubernetes spend, so our guide to cloud commitment discounts across AWS, Azure, and GCP walks through how those discounts are applied at the invoice level.

Namespace and label cost allocation

Kubecost's killer feature is allocating shared node cost across workloads using a weighted blend of CPU and memory requests, CPU and memory usage, GPU requests, persistent volume claims, and (if enabled) network egress. The default weighting is "max of request and usage," which prevents a noisy neighbour from getting a discount by under-requesting. You can change the strategy per cluster in the UI under Settings → Allocation Properties.

Once allocation is running, the most valuable view is Allocation → Aggregated by Namespace, last 30 days. This single screen shows which teams or applications drive your bill, and it pairs nicely with Kubernetes labels for cross-cutting reports like "cost by team" or "cost by environment." To enable label-based reports, set kubecostProductConfigs.labelMappingConfigs.enabled=true and define which label keys map to which business dimensions. For example, team_label: "owner" and environment_label: "env".

Idle cost (the difference between what you pay for nodes and what pods request) shows up as a phantom namespace called __idle__. Watching this number trend over time is the single best leading indicator of right-sizing opportunity; pairing Kubecost with Karpenter on EKS typically cuts idle from 35% to under 10% within a quarter.

Enabling network egress allocation

Network egress is the silent killer of cloud bills, and by default Kubecost does not allocate it. Node cost includes it implicitly, but per-pod attribution requires a separate DaemonSet that uses conntrack (or eBPF on Linux 5.10+) to attribute outbound bytes to a pod. Enable it with:

helm upgrade kubecost kubecost/cost-analyzer \
  --namespace kubecost --reuse-values \
  --set networkCosts.enabled=true \
  --set networkCosts.config.services.amazon-web-services=true \
  --set networkCosts.config.services.azure-cloud-services=true \
  --set networkCosts.config.services.google-cloud-services=true

The DaemonSet runs privileged (it needs NET_ADMIN to read conntrack), so review your Pod Security Admission policy before enabling. CPU overhead is roughly 50–100m per node at moderate traffic levels. Once it has been running for a few hours, the UI will split egress into cross-zone, cross-region, and internet categories, each one priced from the cloud's egress table. Inter-AZ transfer is by far the most common surprise on EKS and AKS; the official Kubecost network costs configuration docs have the full list of supported endpoints.

Budget alerts and Slack notifications

Visibility without alerts becomes dashboard wallpaper. Kubecost supports four alert types: budget (namespace exceeds X dollars), efficiency (CPU efficiency drops below Y%), spend change (anomaly detection vs a 14-day baseline), and recurring (weekly digest). Configure them under Settings → Notifications, or declaratively in Helm values:

kubecostProductConfigs:
  alerts:
    - type: budget
      threshold: 500
      window: 7d
      aggregation: namespace
      filter: payments
      ownerContact:
        - "#finops-alerts"
    - type: spendChange
      relativeThreshold: 0.20  # alert on 20% spike
      window: 1d
      baselineWindow: 14d
      aggregation: namespace

Wire Slack with kubecostProductConfigs.slackWebhookUrl, or use the generic webhook for PagerDuty and Opsgenie. Treat the spend-change alert like a synthetic SLO. A 20% one-day swing in a stable namespace is almost always either a runaway batch job, a leaked LoadBalancer, or a forgotten benchmark, and catching it within hours saves four-figure invoices. (I hit this exact bug shipping a load test; an unbounded retry loop pushed $1,800 of NAT egress in a weekend before the alert fired.)

Multi-cluster federation

For fleets of more than one cluster, the free tier is no longer enough. Kubecost Enterprise's Federated ETL writes a per-cluster Parquet extract to a shared S3, Azure Blob, or GCS bucket every hour, and a "primary" Kubecost reads them back to provide a global view. The pattern decouples clusters from a central database and survives single-cluster outages.

If you prefer the open-source path, OpenCost combined with Thanos or Mimir achieves the same outcome. Each cluster runs OpenCost, exports metrics to a long-term Prometheus store, and Grafana queries the federation endpoint. You lose the polished UI but keep full control of the data plane. The upstream OpenCost documentation includes a worked example using Thanos sidecar.

Production hardening and pitfalls

A handful of operational gotchas catch most first-time installers. First, the bundled Prometheus has a 14-day default retention; if you want longer history without Enterprise, configure remote-write to a long-term store. Second, the cost-model assumes node labels match the cloud provider's pricing API. Custom node taxonomies or self-managed nodes need kubecostProductConfigs.customPricing set, or every node will show up as zero cost. Third, GPU pricing requires nvidiaDcgmExporterEnabled=true and the DCGM exporter installed; otherwise GPU nodes are allocated as plain CPU.

Resource overhead is modest but non-zero. Budget 1 vCPU and 2 GiB RAM for the cost-model, plus 0.5 vCPU and 1 GiB per 100 nodes for the bundled Prometheus. On clusters above 500 nodes, switch prometheus.server.scrapeInterval from 60s to 120s and drop unused targets via relabel rules. Finally, never expose the UI to the public internet on the free tier; there is no authentication. Front it with an Ingress that requires OIDC, IP allowlists, or a VPN.

For deeper right-sizing follow-up, pair Kubecost's recommendations with the techniques in our Kubernetes cost optimization guide. Kubecost identifies waste, but a VPA, HPA, or Karpenter still has to act on the recommendation. The kubecost/cost-model GitHub repository tracks open issues and release notes if you want to stay current on breaking changes.

Frequently Asked Questions

Is Kubecost free?

Yes, the Kubecost Free tier is free for a single cluster with 15 days of metric retention and no SSO. Multi-cluster federation, unlimited retention, RBAC, and SAML/OIDC require Kubecost Enterprise. The underlying OpenCost engine is fully open source and free in all configurations.

How does Kubecost calculate cost?

Kubecost multiplies each pod's resource usage (CPU, memory, GPU, storage, network) by the price of the underlying node, volume, or egress path. Prices come from the cloud provider's public list initially, then get reconciled against your actual invoice (AWS CUR, Azure Cost Management, or GCP BigQuery billing export) once cloud billing is connected, typically within 24 hours.

Does Kubecost support EKS, AKS, and GKE?

Yes. Kubecost runs on any conformant Kubernetes 1.27 or newer, including EKS, AKS, GKE, OpenShift, Rancher, and self-managed clusters. Each cloud has a dedicated billing integration (CUR for AWS, Cost Management exports for Azure, BigQuery exports for GCP) for accurate, post-discount cost.

What is the difference between Kubecost and OpenCost?

OpenCost is the CNCF Incubating, vendor-neutral allocation engine that powers Kubecost. It exposes Prometheus metrics but has no UI. Kubecost is the commercial product that wraps OpenCost with a UI, alerts, savings recommendations, multi-cluster federation, RBAC, and enterprise support.

How much overhead does Kubecost add to a cluster?

Budget roughly 1 vCPU and 2 GiB of memory for the cost-model, plus 0.5 vCPU and 1 GiB per 100 nodes for the bundled Prometheus. Disk usage is the main constraint, about 250–500 MiB per node per day at default scrape intervals. On clusters above 500 nodes, raise the scrape interval to 120s to keep overhead under 1% of fleet capacity.

About the Author Diego Saavedra

Diego spent five years at CloudHealth (then VMware Tanzu) as a solutions engineer working with mid-market AWS customers, mostly in the $200k-$2M/month spend range. He left in 2024 to consult independently and has since helped seven companies - a Series C fintech, a media streaming startup, and assorted SaaS shops - restructure their RI and Savings Plan portfolios. His best-documented win was a $310k/month reduction at a video-processing company by moving Lambda-heavy workloads to Fargate Spot. He's AWS Solutions Architect Professional, AWS DevOps Engineer Professional, and FinOps Practitioner certified. Before CloudHealth he was a DevOps engineer at MercadoLibre in Buenos Aires for three years. Diego writes about Lambda cost patterns, NAT Gateway and data-transfer charges (the silent killers), and how to negotiate an Enterprise Discount Program renewal without getting steamrolled. Based in Buenos Aires, eight years in.