Disaster Recovery: Beyond the Technical Issues
About 20 years ago, I was at a meeting where a film was shown about what happened when a company went through an unexpected disaster. The employees were at the company Christmas party when a fire started in a closet on the other end of the building. This triggered the fire alarm, and the sprinkler system. Everyone evacuated the building, and they weren't allowed back in for over six weeks due to the release of a toxic gas caused by the fire. By the time they could get back into the building, the water damage to desktop computers and paperwork was mostly beyond remediation. The IT team did have the ability to move to an off-site data center and begin working to get the systems up and running, but they had never practiced their DR (Disaster Recovery) plan and ran into many difficulties. The film was very believable; it was easy to see how devastating such an incident could be.
The most interesting thing about the film is that, while the technology has changed considerably in the last 20 years, many of the non-technical issues brought out in the film are still relevant. For example, as management was desperately trying to hold the situation together, an employee was quoted in the media saying how bad conditions were, and that he wasn't sure if the company was going to recover. The company didn't have a policy in place for who was allowed to talk to the media. In today's world, social media makes this issue even more relevant.
Another problem the company faced was communications. While their hurdles revolved around outdated contact information and inability to get back into the building to get paper files (like a copy of their DR plan), communications is certainly a critical issue during a disaster situations today.
For example, companies must communicate with employees during the disruption. But if critical systems (such as email) are down, this can be difficult. A widely circulated toll-free number can help ensure that employees obtain information from a single authorized source. This strategy was used by the firm in the movie and would still be a valid option today.
How many of us rely entirely on directories in our corporate environment (and certainly on our phones) to know how to reach others? How would you communicate with people if you weren't able to access that directory information? How many email addresses and phone numbers do you have memorized? Key employees need to have alternate means of communication set up before the loss of critical systems impairs their ability to communicate. In some cases, cell phones may not be a viable option as the networks are usually clogged during widespread disasters.
Furthermore, collaboration tools (voice, IM, screen sharing, video) may not be available or up to full capacity, making it more difficult to communicate at a time when communication is most critical.
And what about communication with decision makers, who most certainly will be bombarded with questions and surrounded by chaos? What is the most efficient way to bring issues to them for resolution when needed?
For some interesting insight on lessons learned about communications during 9/11, see this article from Harvard Business Review.
Another problem experienced by the company in the film was that they didn't correctly prioritize which IT systems should be brought up first. They brought up the payroll system first, and the system that took customer orders was lower on the list. They realized in hindsight that they could have waited two weeks to get the payroll system up, since they cut payroll every two weeks. But they needed to take customer orders immediately, in order to avoid losing their customers to the competition. While their business people had been asked to participate in the creation of the DR plan, many didn't show up and others didn't give the plan the thought it deserved. I believe this attitude is still common today. While the IT staff at many companies are serious and concerned about DR, their business counterparts often don't want to deal with "gloom and doom scenarios."
Still another non-technical issue to consider is how much your company relies on key people. What would happen if staff members who are critical to the DR recovery process don't come to work? As the people of New Orleans discovered during Hurricane Katrina, even law enforcement officers may not show up. What if key people are injured, or must deal with personal or family issues that take precedence? I have participated in a few DR exercises, but I have never seen this contingency addressed. It would be fascinating to see what happened when someone tapped a key resource on the shoulder and said, "In this scenario, you are no longer able to continue working." How would the rest of the team respond?
While advances in technology can greatly improve a company's ability to respond to a disaster scenario, it may be the non-technical issues that create the biggest challenges. Does your DR plan address these types of issues?