This week’s failure on the NYSE is a classic systems integration and IT investment failure.
Business Owners can learn from this failure in many ways.
I’m not here to rail on the NYSE and its outage. With a system as complex, regulated, and visible as the NYSE outage we’ll be hearing in-depth analysis of the exact causes for some time.
I’m here to say that we can all learn a few things from it while the failure is fresh in our minds.
Because many of the symptoms and conditions that likely occurred are happening in your business right now.
The NYSE outage is clearly at the extreme end of things. They’ve only gone down for major world events in the past (e.g. assassination of Abraham Lincoln, 9/11 terror attacks) – so this is really the first internally cause failure.
But these failures aren’t that uncommon – especially where budgets aren’t as fat as the NYSE. The NYSE has done well to date, but hit a wall this time. Imagine where you own a business and only have a shoe-string budget – can you learn from the NYSE outage?
You certainly can. You can learn to identify issues early and be more realistic.
So – keep your ears open for the following quotes and use them to gauge where your software team is.
Before the NYSE Outage
I am betting someone said at least a few of the following things before the system upgrade (hint: you may hear this at your company too):
- “I think we’ve thought of everything, but we can’t know for certain.” – I am hoping someone in the software/engineering team was mature enough to admit this before failure. If not, the following likely was said:
- “We’ve tested everything. We’re totally ready.” – there are NO systems on the planet that can claim this. They can claim very high reliability but no system is perfect.”
- “We’ve done our best but we’re under a deadline that we can’t move.” – With the SIP timestamp (article) test being imposed engineers would have been under immense pressure.
- “We haven’t been keeping up. This change is brutal.” – mature organizations know that there is a knife-edge that they walk with maintaining the status quo and taking on new technology and approaches. Slower organizations don’t keep up enough and when they fall behind it gets harder and harder to respond.
- “I just don’t know.” – Systems are complex. A system the size of NYSE is an extremely complex beast with so many moving parts that it is inconceivable that anyone can fully understand it. Hearing this is a good thing – it means there are people admitting there is inherent risk.
Overheard During the NYSE Outage
And once things started to fail, I can pretty much guarantee that all of the following were said, likely repeatedly (hint: your people probably say this all the time.) Many of these indicate pathologic problems that need to be addressed for your team to perform well – living in the land of denial will not help you grow your business.
- “Well, this couldn’t cause that.” – Someone that is focused solely on one aspect of the system just doesn’t get the ramifications of change.
- “But that component was supposed to be ready.” – Blaming others for a system-wide upgrade isn’t a healthy sign. Mistakes will happen – how will you handle them?
- “How else could we do it?” – Once the systems start to fail people fall back on “the plan” – yes, the plan that failed. It’s a comfort thing and natural.
- “What do you mean things aren’t working? We tested it.” – Here’s someone that is in total denial that there is a real world out there that is much more complex than any test harness or simulation.
- “That’s weird.” – I had to add this one. It’s a guarantee. Developers and engineers default to this under pretty much every circumstance. This one is normal.
If you’re hearing things like I’ve outlined here and you own the business, you want to be aware that you have normal problems. Problems that can be solved. Identifying them is the start.