Have you ever received alerts from WhatsUp Gold in the middle of the night that a service has gone 'down', only to login and check to see everything is apparently good and happy? Then, just as you're about to logoff, WhatsUp Gold labels the device as 'up' again?
Being a former sysadmin I had to deal with that on just one occasion. That is when I learned about timeout/retry values on active monitors within WhatsUp Gold network monitoring software.
Your defaults might need some adjustment for your timeout/retry values. For example, the 'Ping' active monitor (one of the number one culprits) has a default timeout of 1 second with 1 retry. For some that could be a little too frequent. Here are a few scenarios to help you manage this better.
How To Identify False Positives
WhatsUp Gold polls active monitors every 60 seconds by default. Let's say you have your action policy set to e-mail you immediately after something goes down. That means if the system you ping drops 2 packets when the polling command is sent, it is going to be labeled down and you will end up getting an e-mail. The device will continue to be labeled down until a successful polling cycle occurs. Which could be the next polling cycle or even further ahead.
To deal with this, set the active monitor's timeout and retry values higher. What I typically use is a timeout of 8 seconds with 2 retries. Under that same scenario, the system drops a few ICMP requests but remains labeled 'Up', because it responded to the subsequent ones due to the higher timeout and retry values.
What is important to note is that every monitor within WhatsUp Gold (excluding WMI based monitors) have an adjustable value for timeout and retries. Now, don't go crazy and adjust them all if you don't have to! Simply adjust the offending active monitor.
Adjust Your Timeouts
To verify, when a monitor goes down you should refer to the 'Device Status' page and click on the 'General' tab. In there you will see 'State Change Log' for that device. If the monitor message shows 'Timeout' as the problem, then you're good to go ahead and adjust the timeout for that monitor.
Note that, adjusting the value is done within the active monitor library and thus applies that timeout/retry to *ALL* devices that have that monitor applied. From experience, the monitors that need to be adjusted more frequently are ping, interface, and power supply. Adjust them to 8 second timeout with 2 retries, as recommend above. If you still see the issue, increase it a bit more.
This should address some of the false positives you may be experiencing. Drop me a comment to let me know if adjusting your timeouts has worked for you.
[This post about managing false positives originally appeared on my WUG.ninja blog a few weeks ago. I have several other items posted there that will help you do things with WhatsUp Gold you may not have known were possible.]