“Sorry we are currently experiencing server issues. We hope to be back up and recovered shortly.” -- @wa_status
When is the worst time for a network crash?: A. When you have 465 million users on your network. B. Days after you’ve been acquired for $19 billion. C. When the social media world is ready to pounce on a high-profile failure. D. All of the "above".
Unfortunately for WhatsApp, “D” is the answer and exactly the scenario they faced as their servers went down unexpectedly for several hours the weekend before last (2/22). The issue got so bad that Jan Kaum, WhatsApp’s Founder was forced to come out over the weekend and offer the following apology: “We are sorry for the downtime; it has been our biggest and longest outage in years.”
Now this post isn’t intended to take a shot at WhatsApp, but rather to serve as a warning to other companies as to what can happen and the damage that a prolonged network outage can cause. Let’s be honest, the first thing most of us in the IT field do when we hear of a major outage is scan the names involved and then give thanks that it wasn’t us. The reality is that when you are dealing with networks and technology, issues arise and downtime happens. The key is to sort out “WhatsUp” and limit the number of instances and the length of time required to get back online when problems arise.
For sophisticated architectures such as the ones needed to keep applications like WhatsApp running, it can take hours and hours to locate something as obscure as a downed router, while the network hiccup continues to wreak havoc. Visibility becomes the key to keeping networks running and getting them back online in a matter of minutes, not hours.
So while WhatsApp bears the brunt of a high-visibility network failure, this marks a good time for a little self-evaluation as to whether you have the proper mechanisms in place to either help avoid a catastrophic shutdown or recover quickly so it’s not even the slightest blip on the radar. If you aren’t sure or the answer is no, this would be the perfect time to start asking questions and gaining greater visibility into your networks. At the end of the day, you want to be able to tell folks WhatsUp when they tell you WhatsDown.
(this blog was first published by Wired Innovation Insights on March 3, 2014)