- Published on
Cloud Cost Optimization & FinOps 2026 — Kubecost, OpenCost, Vantage, Cloudability, Spot.io, CAST AI, Karpenter Deep Dive
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Prologue — "Why is the bill up again?" is still a 2026 question
It is 2026 and the same conversation still repeats on the first Tuesday of every month. Finance asks, "AWS is up 18% versus last month, why?" Engineering says, "Traffic grew." The CFO comes back, "So every 1% of traffic costs us 18% more?" Then someone opens the Kubecost dashboard, sees that some namespace is expensive, but cannot explain why it is expensive.
This is the core of cloud cost in 2026. The data is overwhelming, the decisions are not. AWS Cost Explorer, GCP Recommender, Azure Cost Management, Kubecost, OpenCost, Vantage, Cloudability (Apptio), Spot.io (NetApp), CAST AI, PerfectScale, Densify — there are too many tools, each slicing the bill differently. That is why the FOCUS 1.0 specification published by the FinOps Foundation in 2024 has become the industry standard in 2026. AWS, GCP, Azure, Oracle, and Snowflake now export bills with the same schema, and the FinOps Foundation joining the Linux Foundation has stabilized governance.
Then a new thing exploded. LLM GPU cost. Companies that audited their OpenAI, Anthropic, and Cohere bills throughout 2025 realized "this is bigger than our EC2 line item." So FinOps in 2026 is no longer just about buying RIs and deleting unused EBS — it is about per-token economics and GPU utilization as core KPIs.
This article maps the territory. From the FinOps Foundation framework, through the positioning of tools like Kubecost, OpenCost, and Vantage, to Karpenter vs Cluster Autoscaler, the automation layer of Spot.io and CAST AI, and finally LLM inference cost.
1. FinOps Lifecycle — Inform, Optimize, Operate
The FinOps Foundation's official framework is a three-phase lifecycle. The 2026 version (FinOps Framework v2) cleaned up domains and capabilities.
- Inform: Make it visible who is using what and why. Tagging, allocation, billing decomposition, unit economics.
- Optimize: Actually cut based on what you can see. RI/SP purchases, removing unused resources, right-sizing, spot adoption.
- Operate: Make optimization a daily practice, not a one-off event. Policy, guardrails, cost SLOs, departmental chargeback.
The classic failure mode: starting with Optimize. Buying Reserved Instances before knowing what runs where. Then RI utilization drops to the 60s, and the loss on unused RIs exceeds the savings. The order is always Inform → Optimize → Operate.
The six domains: Understand Usage and Cost, Quantify Business Value, Manage the FinOps Practice, Optimize Usage and Cost, Manage Anomalies, Forecast. New capabilities added in 2026 are Sustainability (carbon alongside cost) and Onboarding Workloads (the process by which new workloads enter the FinOps system).
2. FOCUS 1.0 — A Common Language for Cloud Bills
FOCUS (FinOps Open Cost and Usage Specification) 1.0 went GA in June 2024, and v1.2 is on the 2026 roadmap. The point is forcing every cloud bill into the same column schema.
| Column | Meaning |
|---|---|
BilledCost | Actual amount billed (after RI/SP discounts) |
EffectiveCost | The amortized unit price after spreading commitments |
ListCost | Pre-discount list price |
ContractedCost | Contracted unit price |
ServiceCategory | Compute, Storage, Networking, AI and Machine Learning ... |
ChargeCategory | Usage, Purchase, Tax, Credit, Adjustment |
BillingPeriodStart/End | Billing period |
ResourceId, ResourceType | Resource identifiers |
Tags | k=v pairs, normalized |
SkuId, SkuPriceId | SKU identifiers |
What it means in 2026: AWS CUR (Cost and Usage Report), GCP BigQuery Billing Export, and Azure Cost Management Export can all emit FOCUS 1.0. In other words, one SQL query can now query three clouds at once. SaaS vendors like Vantage, Cloudability, Anodot, and Finout no longer have to maintain per-provider parsers.
-- FOCUS 1.0 unified query — top-10 multi-cloud services by spend
SELECT
ServiceCategory,
ServiceName,
ProviderName,
SUM(EffectiveCost) AS cost_usd
FROM focus_billing
WHERE BillingPeriodStart >= '2026-05-01'
AND BillingPeriodEnd < '2026-06-01'
GROUP BY 1, 2, 3
ORDER BY cost_usd DESC
LIMIT 10;
3. Kubecost — Slicing Kubernetes Cost Down to Namespace and Pod
Kubecost was acquired by IBM in 2024, yet the OSS edition continues to ship. The product allocates Kubernetes cost by cluster, namespace, deployment, and pod.
Installation is straightforward.
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken="$(echo helm-install@kubecost.com | base64)"
You get:
cost-analyzer(UI + API)kube-state-metrics,prometheus,node-exporter(metric collection)- Network Costs Daemonset (optional, in/out cluster traffic cost)
Example Allocation API call:
# namespace cost for the last 7 days
curl -G "http://kubecost.local/model/allocation" \
--data-urlencode "window=7d" \
--data-urlencode "aggregate=namespace" \
--data-urlencode "accumulate=true" \
| jq '.data | to_entries | map({name:.key, cost:.value.totalCost}) | sort_by(.cost) | reverse'
Core concepts:
- Allocation: how much each workload consumed (cpuCost, ramCost, gpuCost, pvCost, networkCost).
- Asset: actually billed resources (nodes, disks, load balancers).
- Cloud Integration: pulls the cloud bill and corrects allocation with real per-node prices. Without it, Kubecost uses on-demand list prices, which differ from actual invoices.
A frequently missed detail: the policy for idle cost (nodes are up but no pods scheduled) and shared cost (kube-system, monitoring). Kubecost ships four policies: evenly, proportional, weighted, and none.
4. OpenCost — The OSS Core of Kubecost, a CNCF Sandbox Project
OpenCost is the cost allocation engine donated by Kubecost to the CNCF. After the 2022 launch it was promoted to Incubating in 2024, and by 2026 the kubectl cost plugin is stable.
The relationship:
- OpenCost = algorithms + metrics + API.
- Kubecost = OpenCost + UI + multi-cluster + reporting + alerts + SaaS.
It operates as a Prometheus exporter, so you can build Grafana dashboards directly.
# hourly cost per namespace in USD
sum by (namespace) (
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
* on(node) group_left
node_cpu_hourly_cost
)
+ sum by (namespace) (
container_memory_working_set_bytes / 1024 / 1024 / 1024
* on(node) group_left
node_ram_hourly_cost
)
The significance is vendor-neutral cost standardization. OpenCost holds the AWS, GCP, and Azure rate cards. Commercial tools like Kubecost, Vantage, and CAST AI follow OpenCost metrics under the hood.
5. Vantage — A Unified UI for Multi-Cloud Cost
Vantage launched in 2020 as a SaaS. Its strengths are UI/UX and fast integrations. AWS, GCP, Azure, Snowflake, Datadog, MongoDB Atlas, Databricks, and Fastly bills all land in a single pane.
Core capabilities:
- Cost Reports: intuitive slice-and-dice. Add dimensions (account, service, region, tag) and view time series.
- Active Resources: only resources that actually generate charges. Useful for cleaning up dangling EBS volumes.
- Anomalies: detection of unusual cost spikes, with Slack/PagerDuty alerts.
- Autopilot: automatic AWS Compute Savings Plans purchasing and management (optional, fee-based).
- Network Flow Reports: data transfer cost analysis. Who is sending how much to where.
Big 2026 additions: FOCUS 1.0 native views and an AI Cost dashboard that unifies OpenAI, Anthropic, and Bedrock.
Pricing is usage-based, roughly 0.5-2% of managed spend.
6. Cloudability (Apptio/IBM) — The Enterprise FinOps Veteran
Cloudability is the oldest cloud cost SaaS, launched in 2011. Apptio acquired it in 2019, and IBM acquired Apptio in 2024. Today it is an IBM product.
What stands out:
- TBM (Technology Business Management) integration. The IBM/Apptio methodology of mapping IT assets to business value units.
- Containers (Cloudability Containers): Kubecost-like container allocation. Supports EKS, AKS, GKE.
- Rightsizing recommendations: ML-driven EC2/RDS downsizing.
- Reserved Instance Planner: RI/SP simulation. Compare 1-year vs 3-year and partial vs all-upfront.
- True Cost: direct cost plus indirect (engineering labor, licensing).
When to choose Cloudability: enterprises with $1M+ annual cloud spend and multiple BUs or cost centers. Strong chargeback, showback, and budget workflows. Startups and mid-sized firms get more value from Vantage.
7. Spot.io (NetApp) — Spot Automation Original Ocean
Spot.io was acquired by NetApp in 2020. Two flagship products:
- Elastigroup: automatic spot/on-demand mix management for EC2, GCE, and Azure VMs. ML-based interruption prediction migrates workloads.
- Ocean: Kubernetes node management. Replaces Cluster Autoscaler.
How Ocean works:
- Analyze pod requests (CPU, memory, GPU).
- Spin up the most cost-effective instance type for those pods.
- Maximize spot utilization to a configurable target percentage.
- When an interruption is imminent, proactively migrate workloads (Spot Interruption Predictor).
Differentiators: automatic headroom management, rightsizing applied directly (Ocean Rebalance), and VNGs (Virtual Node Groups) for workload separation.
With Karpenter rising, Ocean's share has plateaued, but it remains compelling for teams who want multi-cloud plus non-Kubernetes automation in one place.
8. CAST AI — Aggressive Rebalancing and Commitments
CAST AI launched in 2019 and grew quickly. Its differentiators are aggressive automation and cloud abstraction.
Core features:
- Autoscaler: like Karpenter, it directly observes pod demand and provisions nodes. Replaces Cluster Autoscaler.
- Rebalancer: periodically analyzes the cluster and migrates workloads to cheaper instances without downtime.
- Spot Automation: automatic spot/on-demand mix with built-in interruption handling.
- Commitments Engine: RI/SP purchase recommendations and automatic management.
- Multi-cloud: same UI across AWS, GCP, and Azure.
Example config (after connecting the cluster, via Helm):
# castai-cluster-controller values.yaml
apiKey:
key: $CAST_AI_API_KEY
clusterID: $CAST_AI_CLUSTER_ID
autoscaling:
enabled: true
unschedulablePods:
enabled: true
nodeDownscaler:
enabled: true
emptyNodes:
delaySeconds: 60
spotInstances:
enabled: true
spotDiversityEnabled: true
spotInterruptionPredictionsEnabled: true
The 2026 differentiator is the AI Workload Optimizer — automated measurement and rightsizing for GPU workloads, leveraging NVIDIA DCGM metrics.
9. Karpenter vs Cluster Autoscaler — Who Wins in 2026
Cluster Autoscaler (CA) is the standard Kubernetes node autoscaler. It scales nodes via Auto Scaling Groups. The drawback: ASG instance types are fixed. Diverse workloads require multiple ASGs with priorities.
Karpenter, an AWS project launched in 2021, hit v1.0 GA in 2024. By 2026 it is the default autoscaler for EKS, with the Azure (AKS Karpenter Provider) port and a GCP port available.
The differences are structural:
| Dimension | Cluster Autoscaler | Karpenter |
|---|---|---|
| Mechanism | Scales pre-built ASGs | Provisions nodes directly from pod demand |
| Instance choice | Limited | Broad (diversity = availability + cost) |
| Spot handling | ASG mixed instances policy | Native |
| Abstractions | ClusterAutoscaler CR | NodePool + EC2NodeClass |
| Node startup | ~3-4 minutes | ~40 seconds |
| Defragmentation | Hard (drift) | Built-in consolidation |
NodePool example:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
intent: apps
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
The decisive bit is consolidationPolicy: WhenEmptyOrUnderutilized — Karpenter automatically reclaims underutilized nodes. CA cannot match this cleanly.
The 2026 picks: EKS = Karpenter, GKE = GKE Autopilot or Karpenter GCP, AKS = Karpenter Provider, want more automation = CAST AI or Ocean.
10. Spot Instances — Designing Around Interruption
Spot instances average 70-90% discount. Caveat: 2-minute (AWS), 30-second (GCP preemption), 30-second (Azure) reclaim notice.
Design principles:
- Diversify instance types. Do not cluster on one. AWS standard today is "spot allocation strategy: price-capacity-optimized".
- Graceful shutdown. PreStop hooks, terminationGracePeriodSeconds, drain in-flight requests.
- Checkpointing. Persist long-running job state to S3 periodically.
- AZ diversification. If one AZ runs dry on spot, fall back to another.
- Stateful workloads with care. RDS and OpenSearch use RIs. Kafka and Cassandra can run on spot if interruption is handled.
- Interruption handlers. AWS Node Termination Handler (NTH), the spot preemption handler on GKE.
NTH deployment:
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
--namespace kube-system \
--set enableSpotInterruptionDraining=true \
--set enableRebalanceMonitoring=true \
--set enableRebalanceDraining=true \
--set queueURL=$NTH_SQS_QUEUE_URL
2026 stats: a well-designed spot cluster cuts cost by 60-80%. A poorly designed one breaks SLAs through interruption. The usual split is batch, CI, dev, stateless APIs on spot; databases, session stores, payments on on-demand.
11. Reserved Instance vs Savings Plans vs Spot — Three-Way Commitment Comparison
AWS offers three discount tracks. The 2026 summary:
| Option | Discount | Commitment | Flexibility | Transfer |
|---|---|---|---|---|
| Standard RI | Up to 72% | Specific family, AZ, OS | Low | Yes (RI Marketplace) |
| Convertible RI | Up to 54% | Family swappable | Medium | No |
| Compute Savings Plan | Up to 66% | Hourly commit ($/hr) | High (all EC2, Lambda, Fargate) | No |
| EC2 Instance Savings Plan | Up to 72% | Family + region | Low | No |
| Spot | Up to 90% | None | Unlimited (but reclaimed) | N/A |
Practical recommendations for 2026:
- Baseline = Compute Savings Plan. The most flexibility.
- Long-stable workloads (DB, control plane) = Standard RI 1-yr or 3-yr.
- Burst, dev, CI, batch = Spot.
- GPU workloads = SP Compute or Capacity Block.
GCP equivalents are Committed Use Discounts (CUD) and Sustained Use Discounts; Azure has Reservations and Savings Plans. FOCUS 1.0's CommitmentDiscountType column unifies these.
12. Kubernetes Resource Right-Sizing — Myths About requests and limits
A common misconception: "higher limits are safer." In reality the reverse — high limits hurt bin packing and inflate cost. CPU limits also cause throttling, which destroys latency.
The 2026 guidance:
- Do not set CPU limits (Tim Hockin's long-standing view). Throttling kills latency. Bin pack with requests.
- Do set memory limits. They protect against leaks. request = limit is safest.
- Base requests on P95-P99 actual usage. Derive from 90 days of Prometheus data.
- VPA in
Offmode for recommendations, neverAuto— it restarts pods.
PerfectScale, Densify, and CAST AI right-sizing recommendations consume this data automatically.
VPA recommendation extraction:
kubectl get vpa my-app -o jsonpath='{.status.recommendation}'
# {"containerRecommendations":[{"containerName":"app",
# "lowerBound":{"cpu":"50m","memory":"100Mi"},
# "target":{"cpu":"100m","memory":"200Mi"},
# "upperBound":{"cpu":"200m","memory":"400Mi"}}]}
13. LLM and GPU Cost — Per-Token Economics
The fastest-growing cost category in 2026 is LLM inference. Self-hosted GPUs (EKS H100, GCP a3-highgpu, Azure ND H100 v5) versus API (OpenAI, Anthropic, Bedrock, Vertex AI).
API pricing (representative as of May 2026):
| Model | input $/1M tok | output $/1M tok |
|---|---|---|
| GPT-5 (OpenAI) | $5.00 | $15.00 |
| Claude Opus 4.7 | $15.00 | $75.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Gemini 2.5 Pro | $2.50 | $10.00 |
| GPT-4o-mini | $0.15 | $0.60 |
Self-hosting cost (H100 SXM):
| Cloud | Instance | $/hr (on-demand) | $/hr (1-yr RI/SP) |
|---|---|---|---|
| AWS p5.48xlarge | 8xH100 | $98.32 | ~$70 |
| GCP a3-highgpu-8g | 8xH100 | $88.50 | ~$60 |
| Azure ND H100 v5 | 8xH100 | $91.00 | ~$65 |
Per-token economics formula:
$/1M tok (self-hosted) = (hourly cost / tokens per hour) * 1M
tokens per hour = throughput tok/s * 3600
You measure throughput with vLLM, SGLang, or TensorRT-LLM to find break-even. Rough rule: under 10M tok/day → API; 10M to 1B tok/day → hybrid; over 1B tok/day → consider self-hosting.
Cost visibility tools:
- Helicone, Langfuse, LangSmith track cost per API call.
- OpenAI Usage Dashboard segments token use per project and API key.
- Kubecost GPU Allocation breaks down self-hosted GPU usage per namespace.
14. S3 Storage Classes — Save 60% with a Lifecycle Policy
S3 cost has two halves: storage plus requests and transfer. Without storage class policy, storage accretes.
S3 storage classes in 2026 (USD/GB-month, representative):
| Class | Price | Access pattern |
|---|---|---|
| Standard | $0.023 | Frequent |
| Standard-IA | $0.0125 | Less than monthly |
| One Zone-IA | $0.01 | Non-critical, recoverable |
| Intelligent-Tiering | $0.023 + monitoring fee | Unknown pattern |
| Glacier Instant Retrieval | $0.004 | Quarterly |
| Glacier Flexible Retrieval | $0.0036 | Annual, minutes-hours to restore |
| Glacier Deep Archive | $0.00099 | 7-10 year retention |
Lifecycle policy example:
{
"Rules": [
{
"ID": "logs-lifecycle",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER_IR" },
{ "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
],
"Expiration": { "Days": 2555 }
},
{
"ID": "incomplete-mpu-cleanup",
"Status": "Enabled",
"AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
}
]
}
Two common leaks: (1) incomplete multipart uploads — failed upload residue is billed; the rule above deletes them after 7 days. (2) versioning on without an expiration policy — old versions accumulate forever.
15. Data Egress — The Number One Hidden Cost
Egress is invisible until the bill arrives, but a bad design makes it more expensive than compute.
Representative rates (2026):
| Path | $/GB |
|---|---|
| AWS → internet | $0.09 (first 10TB) |
| AWS inter-region | $0.02 |
| AWS inter-AZ | $0.01 |
| GCP → internet | $0.12 |
| GCP inter-region | $0.01-0.08 |
| Azure → internet | $0.087 |
| CloudFront → internet | $0.085 (NA/EU) |
Design principles:
- Co-locate in the same AZ. If your RDS, EKS, and cache live in different AZs, inter-AZ traffic accumulates.
- VPC Endpoints (PrivateLink). S3 and DynamoDB are free via Gateway Endpoints. SNS and SQS are Interface Endpoints at about $0.01/hr.
- Use a CDN aggressively. CloudFront, Cloudflare, and Fastly all discount origin egress.
- Be aware of cross-region replication cost. S3 CRR replicates with both transfer and storage charges.
- Monitor with VPC Flow Logs. AWS Cost Explorer does not disclose sources; Flow Logs reveal source IPs.
Kubecost's Network Costs Daemonset measures this from inside the cluster.
16. RDS Cost — The Math of RIs and the Read Replica Trap
RDS cost is instance + storage + IOPS + backup + transfer. Commitments cut 50-70%.
RI math example, db.r6g.4xlarge in us-east-1:
- On-demand: 35,750
- 1-yr No Upfront RI: 7,446
- 3-yr All Upfront RI: 11,300 (upfront), break-even ~17 months
Recommendations:
- Always commit production DBs. Leave on-demand RDS to stage and dev.
- 3-yr All Upfront is the deepest discount, but only commit when workloads are stable.
- Read replicas also accept RIs. Commit both writer and reader.
- Aurora I/O-Optimized vs Standard: I/O-heavy workloads should move to I/O-Optimized, eliminating IOPS bills.
- Reserved Capacity (Aurora Serverless v2 ACU) — 2026 brought 1-year commitments to ACUs.
- Use Blue/Green for DB upgrades. End-of-life instances are repriced upward.
17. Cost Governance — OPA and Kyverno Guardrails
To move from "see cost" to "prevent cost", you need policy. The 2026 standards are OPA (Open Policy Agent) and Kyverno.
Kubernetes Gatekeeper example — block pods without limits.
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredResources
metadata:
name: must-have-requests-limits
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
namespaces: ["prod", "stage"]
parameters:
requests: ["cpu", "memory"]
limits: ["memory"]
Terraform-level policy — block expensive instances (OPA Conftest):
package terraform.cost
deny[msg] {
resource := input.resource.aws_instance[name]
blocked := {"p5.48xlarge", "p4d.24xlarge"}
resource.instance_type == blocked[_]
not has_approval_tag(resource)
msg := sprintf("Instance %s requires CostApprover tag", [name])
}
has_approval_tag(r) {
r.tags.CostApprover != ""
}
Org-level guardrails:
- Enforce tagging: CostCenter, Project, Environment, Owner.
- Budget alerts: AWS Budgets, GCP Budgets, Azure Cost Alerts.
- Anomaly detection: AWS Cost Anomaly Detection, Vantage Anomalies.
- Auto-shutdown for dev during nights and weekends.
18. Unit Economics — /user, $/feature
The ultimate FinOps metric is unit economics. Absolute spend means little; "how much to serve one user" means everything.
Common units:
- $/request: API spend divided by call count.
- $/user: monthly infrastructure cost per MAU.
- $/transaction: per critical business transaction such as payment or order.
- $/feature: cost to operate a feature (requires multi-service attribution).
- $/1M tokens: for LLM workloads.
Dashboard design:
[trend] monthly $/MAU ── is unit economics improving
[decomp] $/request per service ── where is inefficiency
[alert] unit economics +20% spike ── catch regressions
Without this, cost scales linearly with traffic. With it, you can see "traffic doubles, cost grows 1.3x" — true economies of scale.
19. Korean Case Studies — Woowa Brothers, LINE, Kakao
FinOps practice at Korean tech leaders:
- Woowa Brothers (Baemin): a dedicated FinOps team since 2023. They run an internal Kubecost fork plus a unit economics dashboard (infrastructure cost per order). At a 2024 conference they reported cutting cost per order by 30%.
- LINE Plus: enormous global messaging traffic, so they run their own PoPs alongside cloud. Since 2025 they built an in-house token tracker for LLM inference cost, borrowing patterns from Helicone and Langfuse.
- Kakao: multi-cloud (AWS + GCP + Naver Cloud). Adopted FOCUS 1.0 to unify billing. They query everything from one BigQuery dataset.
- Coupang: AWS heavy. After adopting a Reserved Instance and Savings Plan management SaaS, RI utilization sits at 95%+.
- Toss: early Karpenter adopters (2022), with 80%+ spot ratio. Payments stay on-demand, everything else on spot.
Common threads: (1) FinOps is treated as an engineering KPI, not a finance report. (2) Unit economics dashboards. (3) Visibility before automation.
20. Japanese Case Studies — NTT Docomo, Rakuten, LINE
FinOps in Japan is rising quickly.
- NTT Docomo: enormous AWS use for telco backends. At a 2024 FinOps Foundation Japan Chapter event they shared, "enforcing tagging and chargeback reduced unused RIs by 40%."
- Rakuten: migrating from long-standing data centers to GCP and AWS multi-cloud. They combine Cloudability (Apptio) with internal tools.
- LINE Tokyo: separate from Korea's LINE Plus, the Tokyo LINE operation is integrated with NHN and SoftBank Group workloads. Heavy users of Spot.io Ocean.
- CyberAgent: an ad platform with massive BigQuery use. They use Vantage to visualize BigQuery slot cost and schedule workloads.
- Mercari: GKE heavy. Strong use of GKE Autopilot, with Kubecost driving namespace-level chargeback.
A cultural difference: Japanese firms lean toward chargeback and budget governance, while Korean firms are quicker to automation and right-sizing. Both have FOCUS 1.0 as the 2026 baseline.
21. Open Source vs SaaS FinOps Tools — Selection Criteria
| Item | Open source (OpenCost + Grafana, Komiser) | SaaS (Kubecost SaaS, Vantage, Cloudability) |
|---|---|---|
| Initial cost | 0 | $1k-50k/mo |
| Operating cost | Engineering time | License |
| Setup time | 1-2 weeks | 1-2 days |
| Multi-cloud | Build it yourself | Built in |
| Chargeback workflow | Build it yourself | Built in |
| Alerts and anomaly | Build it yourself | Built-in ML |
| Data security | Your infra | External SaaS (optional) |
Recommendations:
- Annual cloud spend
<$500k: start open source — OpenCost + Grafana + cloud-native tools (Cost Explorer, Recommender, Cost Mgmt). $500k-3M: SaaS like Vantage — weigh operations time against license.>$3M: enterprise SaaS (Cloudability, Anodot, Apptio) plus automation (CAST AI, Spot.io).
A trap: tools do not cut cost. They only make it visible. People make the cuts. The FinOps Foundation always emphasizes "tools < culture."
22. Anti-patterns — Seven Reasons FinOps Fails
Common failure modes in the field:
- Optional tagging. If CostCenter is optional, 50% misses it. You must enforce.
- Only finance looks. If engineers do not see the dashboard, no decisions follow.
- Buying RIs first. RIs purchased without knowing workload patterns rarely pay back.
- Right-sizing only. A 60% saving comes from architecture (spot, cache, S3 class), not right-sizing alone.
- Ignoring anomaly alerts. Too many alerts → people mute them → real leaks slip by.
- No unit economics. Looking only at absolute cost makes any traffic increase look bad.
- Treating GPU/LLM as separate. AI is 30-50% of infra cost in 2026. Without dedicated management, it explodes.
23. A 90-Day FinOps Adoption Roadmap
A staged plan for teams new to FinOps.
Days 1-30 (Inform)
- Pipe CUR/Billing Export into BigQuery, Snowflake, or S3.
- Normalize to FOCUS 1.0 columns.
- Define key dimensions (Account, Service, Environment, Team).
- Publish a tagging policy and measure untagged share.
- Install Kubecost/OpenCost for Kubernetes visibility.
- Start a weekly cost review meeting.
Days 31-60 (Optimize)
- Identify the top 10 cost workloads.
- Review right-sizing recommendations (VPA, CAST AI, PerfectScale).
- Delete unused resources (orphan EBS, idle ELBs, old snapshots).
- Apply S3 lifecycle policies.
- Make your first RI/SP purchase, conservatively, 1-year.
- Auto-shutdown dev and CI at night.
Days 61-90 (Operate)
- Cost alerts (anomaly detection, unit economics regression).
- Policy enforcement via OPA or Kyverno.
- Send chargeback/showback per BU.
- Define KPIs (/request, RI utilization target).
- Appoint a FinOps champion per team.
- Roll cost into quarterly OKRs.
24. 2027 Outlook — AI Workloads and Sustainability
Trends to expect over the next 1-2 years.
- AI cost becomes the #1 category. By 2027 many companies will spend more on AI than on general compute.
- Per-token SLOs. "Cost per user in tokens" SLOs become as common as response time SLOs.
- FOCUS 1.x → 2.0. Version 1.1 standardizes commitment models; 2.0 folds in carbon metrics.
- Sustainability integration. Cost = $ + CO2. AWS Customer Carbon Footprint, GCP Carbon Footprint, and Azure Emissions Impact get FOCUS-merged.
- Autonomous FinOps. Automation tools like CAST AI and Spot.io deepen — rebalancing without human approval.
- AI-agent driven FinOps analysis. "Why did spend go up last week?" answered by an LLM running data analysis.
- Edge and CDN cost rises. As edge compute and edge AI grow, edge becomes its own cost category.
To close: FinOps in 2026 is no longer "finance writes a cost report." It is the guardrail of the engineering org and unit economics is a company OKR. The tools — OpenCost, Vantage, CAST AI — are good enough. What is still scarce is always culture and execution.
References
- FinOps Foundation Framework — https://www.finops.org/framework/
- FOCUS Specification — https://focus.finops.org/
- Kubecost — https://www.kubecost.com/
- OpenCost CNCF Project — https://www.opencost.io/
- Vantage — https://www.vantage.sh/
- Cloudability (Apptio/IBM) — https://www.apptio.com/products/cloudability/
- Spot.io by NetApp — https://spot.io/
- CAST AI — https://cast.ai/
- Karpenter — https://karpenter.sh/
- Kubernetes Cluster Autoscaler — https://github.com/kubernetes/autoscaler
- AWS Cost Explorer — https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
- AWS Savings Plans — https://aws.amazon.com/savingsplans/
- GCP Recommender — https://cloud.google.com/recommender
- Azure Cost Management — https://azure.microsoft.com/en-us/products/cost-management/
- PerfectScale — https://www.perfectscale.io/
- Densify — https://www.densify.com/
- AWS Node Termination Handler — https://github.com/aws/aws-node-termination-handler
- Open Policy Agent (Gatekeeper) — https://open-policy-agent.github.io/gatekeeper/
- Kyverno — https://kyverno.io/
- vLLM — https://docs.vllm.ai/
- Helicone — https://www.helicone.ai/
- Langfuse — https://langfuse.com/