Monitoring your cloud resources is the first step. The next step is to figure out what you can do with that information. Ideally you’ll be wanting to run your cloud resources as efficiently as possible in order to keep costs under control while providing a seamless experience for your end- users.
Your organization’s needs will determine your priorities, but in general here are the nine best practices you should be adopting when it comes to cloud monitoring.
What activity needs to be monitored? Not everything that can be measured needs to be reported. You’re going to want to carefully determine the metrics that matter to your organization’s goals as well as the bottom line. Take some time to review exactly what your monitoring solution can track and consider what’s going to be useful to you.
Your cloud-based resources are part of your overall networking infrastructure. They should be managed that way. Your cloud monitoring solution should allow you to see everything (cloud and physical resources) in context so you can quickly drill-down to issues and isolate the cause of problems that span technology silos.
It’s hard to overstate how critical this is. Organizations have their own physical networking infrastructures in addition to cloud services to monitor. They need solutions that can report data from different sources on a single platform, which allows for calculating uniform metrics and results in a comprehensive view of performance.
Every cloud provider will include monitoring tools, but those tools may not integrate with your existing monitoring solution. Research proves that having too many management tools severely degrades IT response time to networking issues and destroys IT productivity. Having one tool that reports on the ENTIRE networking environment makes troubleshooting faster, easier and eliminates finger-pointing.
This is where most traditional IT teams can get caught flat-footed. The ability to scale is a key feature of cloud services, but increased use can trigger increased costs. Robust monitoring solutions should track how much of your organization’s networking activity is on the cloud and how much it costs.
Idle resources aren’t a big deal when it comes to on premise networking equipment like servers and routers, but most cloud resources cost money if they’re not being used – and MORE money if they are. A monitoring solution that alerts IT when cloud resources exceed budget or usage limits can save your organization a fortune.
Most monitoring tools provided by cloud service providers only maintain data for a limited time (usually 30-60 days). That’s not nearly adequate for long-term trend analysis. Your monitoring tool should support maintaining that data in order to show trends over several months at least. Network activity in January is likely to be very different from network activity in July, but that’s impossible to analyze within a 30-60 day window. Understanding long-term network trends can make it easier to run your network more efficiently, saving both time and money.
Alerting IT staff is a good start, but IT teams need to be able to proactively handle issues in the cloud. If activity exceeds or falls below defined thresholds, the right solution should be able to automatically add or subtract servers to maintain efficiency and performance. The same thing goes for performance issues. Not only does this make IT teams much more productive, it makes them look good by resolving issues before they impact end-users.
Organizations need to know what users experience when using their cloud-based applications. Monitor metrics like response times and frequency of use to get a complete performance picture.
Regardless of whether or not you have a NOC, network status and performance should be something that can be seen at a glance by anyone. Your monitoring solution should support customizable dashboards that provide instant visibility into what’s up, what’s down, what’s seeing heavy usage, what’s idle, etc. Not only does this make it easier to troubleshoot, it allows IT teams to see issues develop and resolve them proactively before they impact end-users.
Test your tools to see what happens when there is an outage or data breach and evaluate the alerting and/or automated response systems when certain thresholds are met.