For most cloud environments, you will want to track a few key CPM metrics.
Average Compute/Storage Costs
You can more easily control costs by tracking the total cost of your cloud-based computing resources, including virtual machines and serverless functions. Increasing computing costs without increasing demand can potentially result in an over-provisioned environment, which wastes money until it's fixed.
Make sure to keep an eye on your cloud storage costs. These costs include databases, object storage, and block storage. Increasing storage costs without corresponding needs might indicate a problem, such as inefficient tiering or data lifecycle management.
An error rate metric tells you how often a request fails and what types of errors frequently occur. This metric of cloud performance management indicates the overall health of your application and the cloud environment.
Though an application might cause errors, you might also find that your cloud environment is malfunctioning. The lack of availability of a cloud service–an issue your cloud provider should solve for you–or an improperly configured access credential for services in your cloud can both provoke issues in your environment.
Another way to ensure efficient cloud performance management is to track how many servers you have that are up and running as a percentage of how many you have deployed.
If a server goes down, cloud orchestration and automation tools can re-distribute workloads automatically. Yet, these tools can only do so until they run out of servers. Serious issues can arise if the number of available servers drops below 90% of the total deployed.
Requests Per Minute
By tracking how many requests a cloud application receives per minute, you can detect when these requests differ from historical averages. This metric makes it easier to predict when to increase your cloud capacity.
Acknowledgment time measures how long it takes for your cloud-based app to respond to a request. When you track acknowledgment time, you can find out if your load balancers forward requests quickly enough.
You might also discover you have a problem with underprovisioning if acknowledgment times are slow. Make sure to monitor and compare time-to-acknowledge metrics for each cloud region or individual cloud rather than analyzing them in aggregate. Doing so can help you pinpoint latency issues specific to a particular region or cloud.
Response duration measures how long it takes a cloud application to respond. You can tell if your app has enough cloud resources by referring to this metric. A slow response time may indicate a bug or communication issue within the app. You should also track response duration by region and cloud if you want the best visibility into latency.