In February 2014, as a historic ice and snow storm barreled up from the Deep South toward the East Coast, the Disaster Recovery Preparedness Council posted a bleak report on the ability of businesses to respond to such disasters. The Disaster Recovery Preparedness Council, an independent organization made up of both IT professionals and academics, researches IT disaster recovery management, emerging scholarship, and benchmarking. The group’s survey revealed that over 60 percent of companies interviewed did not have a fully documented disaster recovery (DR) plan and one in four businesses had never tested their DR plan. One third of participants had lost access to critical applications, data files, and even most or all of their data centers for hours at a time.
Such a response is surprising considering that many businesses have experienced IT failures due to natural disasters. One need only look back to Hurricane Sandy to view the devastation that can be wreaked by a major storm on a metropolitan area. Eight days before Election Day 2012, the hurricane slammed into the East Coast, flooding lower Manhattan—including the data center of the Huffington Post, an online news and political analysis site. The site’s IT staff frantically worked to switch over to the company’s backup site in Newark, New Jersey. It should have been possible with the three separate data transmission circuits between its main data center and the backup site. However, all three circuits were in close vicinity of each other, so all three incurred damage and failed during the storm. It took a week for the Huffington Post to get its full site up and running, at a time of peak Web traffic.
The experience of the Huffington Post aside, the overall response to Hurricane Sandy by several major IT and critical infrastructure companies was remarkable in its success. One company that fared particularly well amidst the chaos following the hurricane was The Bank of New York (BNY Mellon).
Established in 1784 by Alexander Hamilton, BNY Mellon is an investment company with $28.5 trillion assets under its custody or administration. BNY Mellon is also a “core clearance bank,” moving large sums of money to the payee’s bank after a check, electronic money order, or promise of payment has been issued by a government and large corporations. The bank performs about $2.8 billion in clearance for the U.S. government annually and is thus vital to the country’s economic infrastructure. With its offices across from the World Trade Center, the bank’s IT system was disrupted for several days following the attacks of September 11, 2001. The company had a replication of its mainframe off-site, but it had tape-based backup and wired networks for its midrange IT systems.
Tape-based DR programs can be problematic for both large companies, like BNY Mellon, and smaller companies. First, tapes are not always a reliable medium. Second, in order to begin the recovery process after a disaster occurs, IT staff must transport the tapes from their storage location back to the main business site, which may not have electrical power nor even be accessible due to flooding or damage caused by a storm or other event. Finally, companies often rely on third-party DR vendors. When a wide-scale disaster occurs, third-party disaster recovery vendors are often hard-pressed to service all their clients in a timely manner. Vendors tend to prioritize larger clients, so smaller companies may find that they have to wait longer to have their systems restored. Even larger companies, like BNY Mellon, might find they have to wait longer than they would like for the complete restoration of their system.
BNY Mellon learned its lessons from the 9/11 disaster, and, in the following years, the bank made several changes and exploited advances in technology to improve its disaster recovery plan. Among other things, the bank relocated its primary data center to a relatively stable area of the country about 800 miles away from its New York headquarters—in Tennessee, a state not often hit by hurricanes or winter storms. BNY Mellon then replicated its data from fund transfers and other core banking applications to two data centers on the East Coast. Although one of the two backup data centers failed due to a power loss during Hurricane Sandy, the site’s backup generator kicked in and the company’s business processes were able to continue uninterrupted.
In the days before the hurricane, BNY Mellon also temporarily transferred many of its business processes from New York City to other U.S. states and to Europe. However, the company still had 4100 New York–based employees that had to work remotely, which many did through the company’s virtual private network (VPN). The VPN peaked at 5800 users—a record load for the company—and although the downtown locations had to shut down due to flooding, power outages, and transportation stoppages, business went on uninterrupted elsewhere. BNY Mellon’s systems didn’t go down even for a second.
Companies looking to strengthen their disaster recovery systems often seek out organizations, such as EMC, that enable service providers and businesses in every industry to deliver infrastructure as a service (IaaS). EMC cloud computing products and services help organizations store, manage, and protect their data and information technology. Many of the companies whose systems have performed well during disasters such as Hurricane Sandy rely on EMC products, including VMAX for information storage; RecoverPoint for archiving, backup, and recovery; and VMware for virtualization. RecoverPoint, for example, supports continuous remote data replication.
During Hurricane Sandy, EMC not only utilized its local IT staff, but also brought in a team from the West Coast and created “war rooms” that operated 24 hours a day, seven days a week to help its customers in New York and New Jersey power down, move their business processes to the customers’ DR sites in advance of the outages, and keep these systems running. On October 28, in advance of the storm, EMC Customer Service Engineer Eugene Libes was stationed in a midtown hotel. By early October 29, his team was bombarded by customer requests for emergency power downs. These power downs allowed customers to shut down their IT systems safely and avoid errors that arise when power is suddenly cut to a system during a storm. Libes recalls, “One of our largest customers was sitting right in the path of water flooding from the Hudson. We had a whole team of people that went in to the customer’s site to power everything down, just hitting switches as fast as we could. Once I got back to my car, water was about halfway up my tires. We jumped in to the car, and as soon as we drove away a huge wave hit the street. We just barely made it out!”
Guy Churchyard, EMC’s president of backup and recovery systems, recalls that engineers whose houses were under water (but families safely evacuated) kept on working throughout the crisis. Engineers literally slept on the floor of conference rooms for five days, recharging their own equipment using their car batteries. One engineer had a generator at his house, and the company created a war room there.
When asked what he would do differently next time, Eugene Libes said, “I would contact customers and persuade them to power down in advance and to move operations to their disaster recovery site. Customers could have avoided a lot of pain by handling all of this in advance. They might not be operating at an ideal level with this approach, but that’s preferable to going down completely or scrambling at the last minute.”
Today, many corporations, including Microsoft, IBM, and Amazon, offer cloud computing disaster recovery solutions. Their role in helping their customers successfully respond to recent natural disasters underscores how technological advances from wireless networking to virtualization have improved DR preparedness. Yet one obstacle still exists for many companies. As one IT director explains, “Everyone wants to have a great disaster recovery system in your company until you explain to them how much it costs.” For companies like BNY Mellon, an effective DR plan is imperative; the bank invests heavily in its DR systems and tests them four times a year. However, small and midrange companies sometimes feel they can afford downtime more easily than paying for an expensive DR plan. As a result, the Disaster Recovery Preparedness Council finds that many businesses are sorely unprepared for unanticipated risks. On the bright side, however, DR technology is improving and as it does, lower-cost solutions are being developed to make planning, testing, and recovery accessible to all.
1)What lessons about DR systems have been learned from natural disasters and terrorist attacks?
Melody: The lessons learned for DR systems would be that companies should always have a back-up plan in advance whenever a natural disaster or terrorist attack happens. Both large and small companies need to plan ahead of time on what to do when employees need to evacuate to another location. The “war rooms” that were created in local hotels by EMC were a clever solution towards disaster recovery, although I would suggest that there should be a designated group of people to be scouts. The scouts can let people know whether or not to power down when a natural disaster may be coming to the “war rooms,” so that employees and customers avoid near-death situations. Moreover, it would be beneficial if all companies transfer and keep their data in online cloud computing data servers. Firms can use similar EMC IT products and services to organize, manage, and store all the data needed and it would make operations go more smoothly during DR recovery events. Furthermore, it seems that several companies are reluctant to pay for DR recovery preparations. The solution to this would be to reason and say that it would cost even more to be unprepared from the loss of large amounts of data, plus the colossal effect from the negative reputation put onto the company.
2)How do these lessons vary depending on the size of a company, its industry, its customer base, and its geographic location?
3)What are the advantages of cloud computing DR solutions? What are the disadvantages and risks?
Cloud computing DR solutions have many advantages. Firstly, they enable organizations to have a faster response to disasters. Organizations can back up their entire information simultaneously on a virtual server. Since virtual servers do not depend on hardware, they can be easily sent to other data centers, which minimize the recovery time. Furthermore, cloud computing DR solutions provide automations to automate the in cloud recovery steps. Without these automation tools, when disasters occur, businesses need to spend time administering intervention and recovery procedures, which cause the recovery solutions to be longer. Secondly, organizations that use cloud computing DR solutions can save cost on a second data center and spending maintenance expenses. These DR solutions also provide flexibility for organizations in selecting the location for their disaster recovery facility. When using a physical facility, businesses need to have their backup server faraway from their main server so that the facilities will not be affected by the same disaster. With cloud computing DR solutions, disaster recovery facilities can quickly move to any part of the world in case a disaster happens.
• Faster response to disaster (in minutes because the entire server can be backed-up)
• Save on cost of building a second data center
• Can store data on any location
• More security
• Save cost on dedicated personnel to handle data
• Can be costly
• Concerns about secrecy and privacy of information (data is stored on third-party servers)
• Concerns about internet connection
4)When dealing with vendors and third parties, what can smaller companies do to make sure their needs are met during an emergency?
5)EMC employed a crew of workers who went above and beyond the call of duty to support their clients during Hurricane Sandy. Yet even one of the firm’s largest customer’s systems failed when the backup generator began to smoke and their building lost power. In a different example, many of Structure Tone’s employees left their laptops at work when they rushed home to be with their families during the storm. As a result, they could not access the VPN. What initiatives should both companies take on their own to ensure that their DR systems will work effectively during emergencies?