Disaster Recovery Planning
John M. Haddad
Your business uses Information Technology to quickly and effectively process information everyday. Our employees use email, phones, and smart phone apps and services to communicate quickly and conveniently. Whether in-house or off premise you use servers to process information and to store your massive amounts of data. The big question that you need to be prepared for is, “What do you do when your systems stop working or in the case of a natural disaster?”
Every company or organization should have a detailed Disaster Recovery Plan (DRP) to follow in case of such a tragic event. It’s more of a unique per business type of approach because of the variances in business types, geographic locations, and amount of documents and data that your business stores in-house. There are no guarantees that human error, hardware failure, natural disasters, and other unforeseen factors will at some point not affect your business, so it’s better to be safe than sorry.
No amount of money or planning can stop some IT disasters from happening. But a good disaster recovery plan can reduce your downtime from a week or a day to hours or even minutes. In addition to identifying mission-critical applications and any infrastructure they rely on, you should also identify the data these applications and tasks need to have access to.
This can include recent email, customer databases, and any documents, spreadsheets, presentations and other “unstructured” files used by project/product management, development, sales, manufacturing, etc. Your company has accumulated a substantial amount of data over time, but only some – often a small fraction – of this data has to be made available again quickly.
What Causes IT Disasters
The cause of an IT disaster may be small and specific. A power supply, CPU, network interface card, RAM, fan, or other component on an individual server may fail. A brief power fluctuation may scramble data or disrupt a program’s activity.
An entire data center going down is rare, but can happen. Weather may take down external power or network service. The resulting fire, flood, or building damage may bring down your entire computer room or data center.
So how do you ensure that you have a good Disaster Recover Plan in place? Disaster Recovery Planning involves 4 key steps.
Disaster Recovery Planning
Step #1 – Business Impact Analysis
A Business Impact Analysis (BIA) defines what capabilities your company can’t operate without. This is the first step in creating a working disaster recovery plan. Doing a BIA must involve top-level non-IT management, to identify and agree on the list of applications that are considered essential, and IT management, to map these tasks against the applications along with the associated infrastructure and other services needed to run and use these applications.
How much IT downtime for mission-critical applications and data is acceptable depends on many factors (notably cost), and will vary from one company to the next — but in general, acceptable downtime today is minutes-to-hours, compared to days to a week or more from a decade ago.
Step #2 – Risk Assessment
The second step to a complete DR plan for you organization includes mapping the 2 types of IT infrastructure:
- IT infrastructure you control, whether located in your offices or in co-location facilities, and IT.
- IT infrastructure you don’t control – like web and cloud services or web sites running in a hosting center.
Once the IT infrastructure has been mapped, look for single points of failure, like a server with only one network card. These are your first places to consider “fortifying” with redundancy.
Step #3 – Risk Management
To lower the risk of a data disaster occurring, fortify yourself against the most common issues and you will have protected yourself against 90%-95% of that small incidents that may impact you.
Redundancy is one popular approach to avoiding or minimizing many IT disaster events. For example, servers, storage and network gear can be configured with two power supplies, connected in turn to separate power sources. Servers, firewalls, UPSs and other gear, even entire sites, can be duplicated. Network and electrical service can be supplied by two separate utilities, on separate cables. Data can be stored across multiple hard drives.
Hosting Applications vs. Outsourcing – Another critical component of managing the risk of data disasters is assessing whether it’s time to outsource any of your IT applications and services, and move them to the cloud.
Step #4 – Disaster Recovery Testing
There are only two ways to determine whether a DR plan works.
- One is when there’s a disaster. This, of course, is the wrong time to discover that you chose wrong, or that one of your tools or services has failed, or that you didn’t include a critical application.
- The other way is to periodically conduct tests.
It is better to uncover a shortcoming in your infrastructure by testing failure scenarios under controlled circumstances, If you don’t discover problems until a real event, you may miss your target time to restore IT service.
External audits can help identifying whether there are any parts of your DR that still need work. One reason is that not all organizations will simulate a full disaster scenario, or carry through to confirm that a full recovery can be done.
Whether you have a small, mid-sized or large company … proper Disaster Recovery Planning is critical to ensure continuity of your business operations. One of my clients, a large billion dollar company, found out the hard way. Water damage in their data center completely shut down several key applications … including email. Business was greatly crippled for 3 days until new servers could be installed. Needless to say, they did not plan for a disaster and didn’t realize the impact of losing email for 8,000 employees would have on their business.
Proper Information Technology Disaster Recovery Planning will minimize disruptions to your business.