The Delta Airlines power outage that grounded thousands of flights across the country was attributed to their legacy systems. What could they have done to prevent it? A network monitoring tool would have pinpointed the source of the problem possibly before the outage happened or even in just minutes instead of the reported 30 minutes. It would have allowed Delta's IT team to be nimbler while they remediated the issue.
Just a month ago, Southwest Airlines dealt with a similar scenario in which a broken network router caused another 2,300 canceled flights and cost Southwest tens of millions of dollars.
It Gets Even Worse
Technical failures affecting major airlines is yet another wakeup-call to businesses. As we march further into the 21st century, companies need to start to replace their aging IT infrastructure, and fast.
This isn’t just about good IT hygiene, it’s about nation-state sponsored hacking into the national infrastructure.
If legacy systems weren’t bad enough, recent reports claim that Chinese hackers are selling vulnerabilities in infrastructure of major airlines on the dark web. There is no proof that the information being sold on the dark web and the outages that occurred Monday are linked, but it wouldn’t be surprising. Legacy systems are not only an issue in the airline industry, but in the healthcare and finance sector as well. The cost of these systems going down can have a serious economic toll.
It’s not just that old equipment is more likely to break down - these systems haven’t been patched or updated in years. Vulnerabilities that are known by cyber criminals will not be fixed thereby making these systems insecure. All patch work needs to be done in-house. This means hiring extremely specialized (and expensive) engineers or even hiring former developers of the legacy systems. But since many businesses have been doing this for years, they generally won't make the transition to new infrastructure until their hand is forced. It’s a time and money suck to say the least, which is why it hasn’t been happening.
Network Monitoring Could Have Prevented These Big IT Fails
Let’s take a look back at the issue that Southwest ran into last month. Apparently, the reason for Southwest’s outage was due to a single network router being the access point for hundreds of critical applications. When this router “partially” failed (even though a backup process was in place in case of a breakdown) the backup was not triggered because it was not a complete failure, therefore causing a domino effect.
Southwest Airlines did not go into further details, but a network monitoring solution could have been setup to set a threshold that would have triggered an alert on the partial failure of the device. IT would have seen this alert and been able to understand quickly where the outage was happening - possibly even before it occurred.
Another fix would be not having so many critical applications running through the same access point on the network. This way if one part of the IT stack failed, the damage would have been minimized to only the applications connected to that router.
As far as Delta is concerned, it would be important to make sure that strict thresholds for alerting IT were set for any legacy systems in the IT infrastructure. Legacy hardware, or in Delta’s case a power switch, should be under constant scrutiny by their IT team since legacy systems themselves are a serious risk to business operations.
Legacy software and hardware being used in business critical systems should be a concern for most businesses in this day and age. Not monitoring these systems properly is a critical fault of the IT teams at Delta and Southwest. We don’t have all the details as to what happened, but from what we can gather they have two options: either replace the legacy systems, or monitor them like a hawk.