Unsinkable Data Center Crashes in Seattle
Aug 15, 2006 5:00 AM PT
One of the most advanced communication hubs in the world crashed unexpectedly in Seattle on Sunday afternoon, July 30, 2006 due to a string of malfunctions in the facility's electrical power system. The facility, called Fisher Plaza, is billed as one of the most secure data centers and telecommunications hubs in the Northwest, capable of surviving major earthquakes and running for weeks on its own power. It is comprised of two buildings, Fisher Plaza East at 197,000 square feet and Fisher Plaza West at 99,000 square feet.
Ten different telecommunications carriers bring fiber into Fisher Plaza, which serves as a major telecommunications hub and is home to several commercial ISPs and Tier-1 bandwidth providers. Its symbolic importance is underscored by an array of satellite dishes on its roof and its location across the street from the Seattle Space Needle and the Science Fiction Museum and Hall of Fame.
Mission-critical equipment areas within Fisher Plaza were built to exceed seismic zone 4 standards, the highest protection level achieved by any data center or collocation facility in Puget Sound. There are six multi-megawatt generators on site, and three 22-ton redundant HVAC environmental cooling systems. The facility boasts of an elevator capable of lifting a loaded truck. Two emergency wells can supply water to Fisher Plaza's cooling towers if city water is unavailable.
A rooftop helicopter pad supports KOMO 4 television's studios, which are located onsite along with the KOMO 1000 AM radio station. Both stations ceased broadcasting during the events of July 30.
The causes and consequences of the events of Sunday afternoon, July 30, are in dispute. Representatives of Fisher Plaza's owners declined numerous requests to comment on the events until Aug. 10. An undated written statement issued on that date on Fisher Plaza letterhead read: "On Sunday, July 30, 2006, Fisher Plaza East experienced a Seattle City Light power outage that triggered a generator start and load transfer event."
Seattle City Light representatives dispute Fisher Plaza's claim that there was a power outage on that date. City Light finds no record of any power abnormality within City Light's network serving downtown commercial customers at the time Fisher Plaza East went dark. No other commercial customers that share Fisher Plaza's network reported an outage on that date.
A principal metric for measuring the reliability of power delivered to a network is called SAIDI. It stands for system average interruption duration index. The average SAIDI for the downtown Seattle network serving Fisher Plaza and other commercial customers has been zero for the last several years. In Seattle, as in almost all electrical utility franchise areas, residential networks have poorer SAIDI scores than networks designed for major commercial customers.
Seattle City Light's control center never registered an outage or abnormality in Fisher Plaza's electricity supply and did not recognize that Fisher Plaza East had gone offline until someone from the facility telephoned City Light over 45 minutes after Fisher Plaza East went offline. There had been a minor system change within City Light's network at the time the outage began, which City Light engineers forensically examined. The engineers speculated that the three-cycle power supply serving Fisher Plaza may have experienced a momentary fluctuation in just one of its cycles. This type of a minor single-cycle fluctuation is not considered an outage or a brownout.
If Fisher Plaza's settings were over-sensitive, then a minor fluctuation in one cycle might have been enough to take Fisher Plaza offline. Facilities with their own electrical cogeneration equipment sometimes set their sensitivity levels too high, ostensibly to prevent damage to their own electrical equipment.
Power stabilization equipment such as motor-driven stabilizers or high speed capacitors are not widely used in the Northwest U.S., but are a common sight in South Asia where power supplies suffer hertz losses, voltage dips and amperage shortfalls. India, for example, has a robust manufacturing industry for servo controlled stabilizers.
Following the Time Line
A rough chronology of events has been constructed with information provided by an Internet service provider (ISP) based in Fisher Plaza (and used by InternationalStaff.net) and by Seattle City Light.
- 15:59 PDT -- Power is disrupted in the Fisher Plaza East building. Service to Fisher Plaza West remains normal throughout the event.
- 15:59 PDT to 16:05 PDT -- Tenants are notified that backup generators have come online. However, power within Fisher Plaza East is not restored because the breaker that allows the generators to feed power into the building's system fails to engage. The generators are running, but not connected to Fisher Plaza East's internal electrical transmission network.
- 16:22 PDT -- The ISP used by InternationalStaff.net experiences a shutdown of its switch and core routers, which are served by a single Uninterruptible Power Supply (UPS). The limits of this UPS are reached at 16:22, causing the core routers that connect the ISP to the Internet to shut down. Half this ISP's data center uses the same UPS, causing a loss of power to some of the ISP's servers and customer servers as well.
- 16:27 PDT -- An engineer arrives onsite and begins to inspect the system. Fisher Plaza's table of organization and equipment did not provide for an onsite engineer who was trained to reconnect Fisher Plaza to City Light's network, so one had to be called in from offsite. The engineer discovers that the breaker connecting Fisher Plaza's generators to Fisher Plaza East's internal network has failed to engage.
- 16:34 PDT to 16:35 PDT -- Engineers initiate the process for resetting the breaker.
- 16:36 PDT to 16:37 PDT -- Engineers manually force the connection back to the City Light network, restoring power to Fisher Plaza East.
- 16:44 PDT -- The ISP's network operations are restored.
- 16:55 PDT -- Fisher Plaza notifies City Light that they are experiencing problems. At 17:05, an engineer from Fisher Plaza contacts City Light's control room to discuss the event.
- 18:35 PDT -- InternationalStaff.net is notified via e-mail of the outage by the ISP.
- 20:25 PDT -- Calls are placed to Fisher Plaza to inquire about the causes and consequences of the event. Repeated calls fail to produce results, so on Aug. 4, I spent the afternoon and early evening sitting in the Fisher Plaza lobby seeking to speak with a designated spokesperson for Fisher Plaza. Spokespeople were not designated until Aug. 10.
Conditions During the EventThe atmosphere inside Fisher Plaza East during the event was described by one observer as being eerily quiet. A big data center is normally full of humming equipment. The humming goes away when the power is turned off. During the outage, shouting was reported inside the building as different systems failed.
One tenant reported that the power outage affected all building systems in Fisher Plaza East, including the telephone system and access to the facility via the access-control security system. Fisher Plaza's management disputes this report, claiming that all personnel who needed access to the building were able to gain access.
City Light has extended an invitation to Fisher Plaza to meet with its engineers to discuss how to prevent a reoccurrence of the event, but Fisher Plaza has reportedly not yet responded. The chronology presented above has not been verified with Fisher Plaza, in part because of issues within their chain of command.
The refusal of Fisher Plaza's management to discuss the event until eleven days after it occurred was mirrored by several telecommunications firms that are tenants at the facility and that have declined repeated requests to discuss the impacts of the event and what they are doing to prevent a reoccurrence. Attempts to maintain secrecy and shift responsibility for the causes and consequences of the Fisher Plaza outage have made the events of July 30 as much about corporate culture and professional integrity in the face of adversity than about the technical issues surrounding the outage.
Good Risk Communication is Good Business
Verizon and Vonage (NYSE: VG) provided immediate access to information about the event and were prepared to discuss how their traffic had instantly been routed around the stricken hub. The events of July 30 gave both firms the opportunity to highlight the sophistication and redundancy of their networks. For Vonage, it also provided an opportunity to draw positive attention to the progress it has made in providing access to 911 emergency telephone number services for almost all Vonage customers in the U.S.
In September 2004, a fire at a telecommunications facility in India disrupted voice lines to 60 percent of the commercial call centers in that country. One unexpected aspect of the outage, which halted all of InternationalStaff.net's call center outsourcing programs in India at the time, was how accepting and supportive clients were -- once they were fully informed and given an opportunity to ask questions.
India and Pakistan experienced disruptions in Internet access and long distance voice lines for ten days in June and July 2005 and again in September 2005, with Pakistan being particularly hard hit. Attention was given to identifying the source of the problems, but considerable media attention and public discussion in Pakistan was also given to the practical aspects of critical infrastructure planning.
Public discussion and private-sector openness in Pakistan about the outage from June 27 to July 8, 2005, resulted in substantial commitments of public and private funds to strengthen critical infrastructure and for long-term maintenance. Fisher Plaza's outage, in contrast, has not been found to have been reported in print or on the Internet and has not been the subject of any public discussion.
Public discussion of private failures can be a painful process, but it is an essential process if public confidence is to be maintained, and if a facility's tenants and their clients are to be convinced that premium prices translate into premium service.
Anthony Mitchell , an E-Commerce Times columnist, has been involved with the Indian IT industry since 1987, specializing through InternationalStaff.net in offshore process migration, call center program management, turnkey software development and help desk management.