When you examine the Q3 performance of the majority of tech firms, a common trend can be seen.
Revenue grows steadily. Cloud costs don’t. They spike.
The majority of companies today are doing the basics right. They turn off unused servers. They use Spot Instances. They rightsize workloads. Big companies even have FinOps teams.
And yet the bill continues to rise higher than predicted.
I have just examined the infrastructure of a fast-growing fintech scale-up. Paperwise, all was disciplined. As a matter of fact, their cloud expenditure was 30% more than their internal cost model.
- The problem wasn’t computing.
- It wasn’t storage.
- It was much more difficult to observe.
The actual issue was architecture-level expenses concealed in pricing footnotes, indistinct billing SKUs, and best practice designs that silently bill rent.
This article disaggregates those hidden costs, explains why they occur, and demonstrates how engineering teams can get rid of them before they become fixed line items.
Why is Cross-AZ Traffic More Expensive?
All reliability guides inform you that you should deploy to more than one Availability Zone.
This advice is correct. High availability rescues businesses in case of infrastructure failure. The thing that most teams fail to realize is that high availability does not come free.
In AWS, inter-Availability Zone data transfer within the same region is billed at approximately $0.01/GB in each direction.
- That is small until you consider actual workloads.
- Take an example of a replicated database or a Kafka cluster:
The Zone A application server is connected to a database replica in Zone B.
You pay when data is out of Zone A.
You pay again when it goes into Zone B.
Repeat that by constant replication, internal APIs, and service-to-service chatter, and in no time, Regional Data Transfer is one of the biggest and most obscure line items on the bill.
How to Reduce Cross-AZ Costs
Use zone-local traffic as much as possible by matching app servers with local replicas.
Set up load balancers and service meshes to prevent unnecessary cross-zone hops.
Compress and batch data in case cross-AZ movement is inevitable.
It is a typical illustration of an expense that is not noticeable until you magnify traffic lanes rather than the number of instances.
Does Your NAT Gateway overcharge you?
A security best practice is the use of private subnets. Public IPs. Backend servers that are not public are more difficult to attack, manage, and are generally safer.
Nevertheless, outbound access is still required by those servers. Software updates, container pulls, and third-party APIs do not vanish in the privacy of your system.
The traffic is passing through a NAT Gateway, and NAT Gateways are charged in two ways:
- A fixed hourly fee
- A per-GB data processing fee
The actual issue is manifested when teams pass internal service traffic through the NAT without knowing it.
A common example is S3 access.
The data in S3 is frequently downloaded to private EC2 instances using the NAT, that is,
- Traffic exits your VPC
- Passes through the NAT
- Comes back into AWS
- You pay a NAT processing fee per byte.
Elimination of NAT Gateway Waste.
Apply VPC Endpoints (PrivateLink) to AWS services.
S3 Gateway Endpoints are free and do not use NAT at all.
This is also the case with DynamoDB, Kinesis, and other core services.
This single architecture solution can save thousands of dollars every month in data-intensive settings.
This is the most problematic area of teams as well: such charges are displayed in general terms, such as Data Processing or Regional Transfer. Such tools as Costimizer automatically surface such patterns by matching network paths with billing SKUs rather than requiring teams to reverse-engineer invoices.
“Network and data transfer costs often hide behind fragmented billing categories, making architectural inefficiencies hard to detect.”
Are Costs Being Secretly Driven by Logs?
Observability is essential. You cannot run what you do not see.
One of the quickest methods of filling cloud bills without breaking anything is logging.
Cloud logging fees are not based on usefulness, but on ingestion and storage.
A system may be as healthy as thieves, and your logging bill is quietly going off.
A common scenario:
- One of the debug flags is left on by mistake.
- All API requests capture complete payloads.
- Performance stays stable
Costs skyrocket
CloudWatch ingestion is already priced at $0.50 per GB, and this does not include any third-party tools such as Datadog or Splunk.
- How to Manage Observability Spend.
- Sample success logs aggressively and maintain error logs fully.
- Do not turn on VPC Flow Logs everywhere; limit them.
- Filter logs on the local level with agents and then send them upstream.
- Logs are not supposed to make noise that you pay to save.
Are S3 Requests Costing More?
Pricing of storage is deceptively straightforward.
Access pricing is not. It is smart on paper to move data to cheaper storage classes such as S3 Standard-IA. The storage rate per GB is reduced by nearly half.
But request pricing doubles.
This is an issue when applications produce millions of small files. In those cases:
Savings in storage are insignificant.
- Request costs dominate
- You pay more to have cheaper storage.
- How to escape the S3 Request Trap.
- Combine small files into large files.
Ingestion tools such as Parquet or batch ingestion can be used.
Request costs + transition costs should always be calculated before lifecycle moves.
When the math does not work, do not just optimize blindly.
Are You Paying for Zombie Resources?
This is the most common, and yet one of the costliest, problems.
The root volume of an EC2 instance tends to vanish when the instance is terminated.
Other volumes of data do not.
- They detach.
- They sit idle.
- They keep on billing indefinitely.
The same can be said about snapshots that have excessively generous retention policies.
How to Kill Orphaned Storage
- Automatically clean up unattached volumes more than a few days old.
- Aggressive on retention of review snapshots.
- Make short-term snapshots short, long-term long.
- The zombie resources are not often observed in dashboards, but they silently consume budgets month after month.
Conclusion
Cloud expenses are not a billing issue. They are a problem of architecture. Use the cloud as a fixed data center, and you will never pay less. Use it as a utility in which each byte, request, and hop is priced, and efficiency can be realized. You may need different cloud cost optimization tools to see if there’s anything that can atleast help you control the damage.
Author Bio
Saim is a DevOps Engineer at Costimizer, an AI-driven cloud cost optimization platform. Costimizer helps engineering and FinOps teams uncover hidden architectural costs, automate governance, and regain control over AWS, Azure & GCP spending. When he’s not analyzing billing data, he writes about the intersection of cloud engineering and financial discipline.


