I was travelling to a user meeting last week and going through Logan airport in Boston, I saw very long lines at some Delta counters. This was on Wednesday, 3 full days after the IT system outage that grounded almost 500 flights on Sunday morning and they were still feeling the damages from that outage. Earlier this year, Southwest had to cancel 2300 flights after one router in one of its data centers failed, that’s thousands of grounded passengers for one incident. That’s a lot of angry customers, a lot of bad publicity and a huge operations burden to get back to normal.
I thought this was a good reminder to never consider risk in a vacuum, especially risk for your IT assets. A recurring conversation I have with customers is the separation of IT Risk, Security and Vulnerabilities Management from Enterprise GRC. You can argue that the processes are different, the technologies are different and the people using them are different, and you’d be right. An Operational Risk Manager and an IT Security Analyst do not do the same job, but, they pursue the same goal.
IT resources in an organization are there to support a business process and deliver a business outcome. A risk to an IT asset, say a router from an airline data center, is a risk that could derail the entire operations of the whole company for a whole day. I’d say that qualifies as a major risk. And yet, the only way you can assess the router’s risk correctly is by going beyond the IT resource itself and assessing the business process it supports, the criticality of the asset to the process and the criticality of the process to the operations. The router in itself is not critical; it’s a fairly simple IT asset, easy to replace, containing decent monitoring. It’s only critical because its failure would ground thousands of planes.
When considering recovery plans and controls you need to have plans and controls for the asset AND the affected processes. Otherwise it would be like slipping on a patch of ice and breaking your leg, then only working on removing the ice. You should probably get your leg fixed at some point. Context matters and downstream dependencies matter. How can you have a board level discussion when considering only the IT side? It won’t mean anything to the board that routers have a medium-high risk of failing. On the other hand, if you tell them that a router failure could result in 2300 grounded planes, it might be easier to get their attention.