Lately, we’ve talked a lot about monitoring your network. We’ve gone over passive and active monitors, SNMP, and sFlow vs NetFlow, but there’s one important part of monitoring that we’ve sort of glossed over: alerting.
Monitoring your network is all well and good, but it doesn’t help to have someone watching what’s happening on your network if they aren’t telling you when things are going wrong. A good alert system empowers you to proactively respond
to problems before they impact users. Conversely, a poorly configured one can be a nuisance, causing business disruptions, burnout and, worst of all, “alert fatigue” that conditions users to ignore important alerts. Anyone who’s
woken up a storm of alerts from their monitoring system only to log on and find your systems humming along without issue can attest to that.
All-in-all, it’s safe to say that Alerting is one of the most important pieces of the network monitoring puzzle, so why is it that it’s so commonly misconfigured? In this blog post, we’ll go over the do’s and don’ts of alerting,
and discuss how properly configured alerting can save time, money, effort, and even your sanity.
The Bad: False Positives, Alert Storms, and Alert Fatigue
Before we look at good alerting, let's examine the issues that a poorly configured system can cause: false positives, alert storms, and, worst of all, alert fatigue.
False positives are what happens when you get an alert telling you that something is wrong, only to log on and find that everything is A-Ok. Trust me, there’s nothing worse than a false positive in the middle of the night. False positives can result from improperly set-up thresholds, polling periods or action policies. For example, if your monitoring tool is set up to poll active monitors every 60 seconds, but you have an action policy set up to e-mail you immediately after something goes down, you may end up with more alerts than you wanted.
An alert storm occurs when one device goes down and its entire hierarchy of dependent devices sends out alerts as well, telling you that they’ve lost connection. Of course, you already know this, but now you have dozens of alerts flooding your
inbox. Dependency mapping can help ensure that this doesn’t happen.
Both of these problems can erode user-trust in your alerting, and lead to our final problem-child: alert fatigue.
Alert fatigue occurs when one is exposed to a large number of frequent alarms (alerts) and consequently becomes desensitized to them. In other words, all of those alerts simply become background noise, and the alerts that actually
matter fade into that din as well. In the worst cases, employees may even set up email filters for alerts—a major faux pas. Alerts that end up in the spam folder won't do any good.
Five Key Qualities of Well-Configured Alerts
So now that we know what bad alerting looks like, let’s look at the opposite. A well configured network monitoring system should be able to keep your team on top of what's going on in your network so they can act before users are negatively affected.
So, what does that look like? Alerts should have these five properties:
Actionable: On-call techs don't need to be bothered with low-priority or informational alerts.It’s important to be selective when setting alerts up so that you don’t overwhelm your personnel with useless alerts. For example,
you might not be interested in informational events happening on your Windows systems or your domain controller, So you may only set up critical alerts on those systems.
Trustworthy: False-positives and an excess of low-priority alerts erode trust in the system, which can lead to important alerts being ignored.
Dependency-aware: You should not get alerts for each dependent device that goes down. If a gateway-device goes down, that’s the only alert you need, you don’t need an alert from every single connected device beyond that
telling you that its lost connection.
Escalatable: You should be able to sent alert notifications per a predefined order of hierarchy, which avoids multiple alerts that could overwhelm personnel, but also escalates alerts to the proper personnel. That way your sysadmins
won’t alerted for routine issues that a tech could handle unless the tech is unresponsive.
Alarming: Alerts should be able to reach your techs, wherever they are! That doesn’t necessarily mean a blaring klaxon and flashing red lights, but you need options for alerting. Whether via email, SMS, or slack, you need
to know when something is going on.
How WhatsUp Gold Can Help Stop Alert Storms and Prevent Alert Fatigue
Setting up efficient, actionable alerting is a lot easier with a powerful network monitoring tool like WhatsUp Gold (WUG).
WUG comes with several out of the box features that will help you easily set up actionable alerts for your network devices, including:
Alert Escalation: Notification policies in the WUG Alert Center can be configured to escalate alerts based on the criticality of the network components – the alerts can move up from automatic trouble ticket generation to sending
out alerts to pre-designated administrators.
Alert Acknowledgement: The first responder’s acknowledgment is considered an indication the issue is being addressed, and no further alerts are sent, unless triggered by the notification policy or as log messages after the issue
has been resolved. This ensures that the issues are not fixed within the timeframe are addressed appropriately. Likewise, information about the action taken can be added to the acknowledgment process, thereby providing problem resolution data that
can be used in the event the issue reoccurs.
Dependency-Aware Alerting: WhatsUp automatically apply dependencies rules to discovered layer 2 and layer 3 devices to prevent alert storms. These settings can also be set manually.
Alert Thresholds: Each monitored aspect of your network and applications can be configures to generate an action at certain intervals or thresholds. Critical devices or applications could be set for lower threshold than other devices.
Want to Learn More About Alerting?
Want to know more about setting up alerting in WhatsUp Gold? Check out or on-demand webinar How to Be an On-Call Sysadmin Without Going Crazy. In
this webinar, we’ll explore effective alerting techniques and technologies that let IT teams be “online” when needed, without subjecting them to incessant alert storms and false positives.
In this webinar you'll learn how to:
- Set up alerts that can reach your techs, wherever they are
- Minimize the risk of alert fatigue and keep your alerts out of the spam folder
- Reduce false positives to get fewer, more intelligent, actionable and trustworthy alerts
- Prevent multiple alerts from overwhelming personnel with alert escalation and alert acknowledgment