Wednesday, October 9, 2013

Meltdowns Hobble NSA Data Center!

A version of this article appeared October 8, 2013, on page A1 in the U.S. edition of The Wall Street Journal, with the headline: Meltdowns Hobble NSA Data Center.
Meltdowns Hobble NSA Data Center
Investigators Stumped by What's Causing Power Surges That Destroy Equipment
By
SIOBHAN GORMAN
Chronic electrical surges at the massive new data-storage facility central to the National Security Agency's spying operation have destroyed hundreds of thousands of dollars worth of machinery and delayed the center's opening for a year, according to project documents and current and former officials.
There have been 10 meltdowns in the past 13 months that have prevented the NSA from using computers at its new Utah data-storage center, slated to be the spy agency's largest, according to project documents reviewed by The Wall Street Journal.
One project official described the electrical troubles—so-called arc fault failures—as "a flash of lightning inside a 2-foot box." These failures create fiery explosions, melt metal and cause circuits to fail, the official said.
The causes remain under investigation, and there is disagreement whether proposed fixes will work, according to officials and project documents. One Utah project official said the NSA planned this week to turn on some of its computers there.
NSA spokeswoman Vanee Vines acknowledged problems but said "the failures that occurred during testing have been mitigated. A project of this magnitude requires stringent management, oversight, and testing before the government accepts any building."
The Utah facility, one of the Pentagon's biggest U.S. construction projects, has become a symbol of the spy agency's surveillance prowess, which gained broad attention in the wake of leaks from NSA contractor Edward Snowden. It spans more than one-million square feet, with construction costs pegged at $1.4 billion—not counting the Cray supercomputers that will reside there.
Exactly how much data the NSA will be able to store there is classified. Engineers on the project believe the capacity is bigger than Google's largest data center. Estimates are in a range difficult to imagine but outside experts believe it will keep exabytes or zettabytes of data. An exabyte is roughly 100,000 times the size of the printed material in the Library of Congress; a zettabyte is 1,000 times larger.
But without a reliable electrical system to run computers and keep them cool, the NSA's global surveillance data systems can't function. The NSA chose Bluffdale, Utah, to house the data center largely because of the abundance of cheap electricity. It continuously uses 65 megawatts, which could power a small city of at least 20,000, at a cost of more than $1 million a month, according to project officials and documents.
Utah is the largest of several new NSA data centers, including a nearly $900 million facility at its Fort Meade, Md., headquarters and a smaller one in San Antonio. The first of four data facilities at the Utah center was originally scheduled to open in October 2012, according to project documents.
In the wake of the Snowden leaks, the NSA has been criticized for its expansive domestic operations. Through court orders, the NSA collects the phone records of nearly all Americans and has built a system with telecommunications companies that provides coverage of roughly 75% of Internet communications in the U.S.
In another program called Prism, companies including Google, Microsoft, Facebook and Yahoo are under court orders to provide the NSA with account information. The agency said it legally sifts through the collected data to advance its foreign intelligence investigations.
The data-center delays show that the NSA's ability to use its powerful capabilities is undercut by logistical headaches. Documents and interviews paint a picture of a project that cut corners to speed building.
Backup generators have failed numerous tests, according to project documents, and officials disagree about whether the cause is understood. There are also disagreements among government officials and contractors over the adequacy of the electrical control systems, a project official said, and the cooling systems also remain untested.
The Army Corps of Engineers is overseeing the data center's construction. Chief of Construction Operations, Norbert Suter said, "the cause of the electrical issues was identified by the team, and is currently being corrected by the contractor." He said the Corps would ensure the center is "completely reliable" before handing it over to the NSA.
But another government assessment concluded the contractor's proposed solutions fall short and the causes of eight of the failures haven't been conclusively determined. "We did not find any indication that the proposed equipment modification measures will be effective in preventing future incidents," said a report last week by special investigators from the Army Corps of Engineers known as a Tiger Team.
The architectural firm KlingStubbins designed the electrical system. The firm is a subcontractor to a joint venture of three companies: Balfour Beatty Construction, DPR Construction and Big-D Construction Corp. A KlingStubbins official referred questions to the Army Corps of Engineers.
The joint venture said in a statement it expected to submit a report on the problems within 10 days: "Problems were discovered with certain parts of the unique and highly complex electrical system. The causes of those problems have been determined and a permanent fix is being implemented."
The first arc fault failure at the Utah plant was on Aug. 9, 2012, according to project documents. Since then, the center has had nine more failures, most recently on Sept. 25. Each incident caused as much as $100,000 in damage, according to a project official.
It took six months for investigators to determine the causes of two of the failures. In the months that followed, the contractors employed more than 30 independent experts that conducted 160 tests over 50,000 man-hours, according to project documents.
This summer, the Army Corps of Engineers dispatched its Tiger Team, officials said. In an initial report, the team said the cause of the failures remained unknown in all but two instances.
The team said the government has incomplete information about the design of the electrical system that could pose new problems if settings need to change on circuit breakers. The report concluded that efforts to "fast track" the Utah project bypassed regular quality controls in design and construction.
Contractors have started installing devices that insulate the power system from a failure and would reduce damage to the electrical machinery. But the fix wouldn't prevent the failures, according to project documents and current and former officials.
Contractor representatives wrote last month to NSA officials to acknowledge the failures and describe their plan to ensure there is reliable electricity for computers. The representatives said they didn't know the true source of the failures but proposed remedies they believed would work. With those measures and others in place, they said, they had "high confidence that the electrical systems will perform as required by the contract."
A couple of weeks later, on Sept. 23, the contractors reported they had uncovered the "root cause" of the electrical failures, citing a "consensus" among 30 investigators, which didn't include government officials. Their proposed solution was the same device they had already begun installing.
The Army Corps of Engineer's Tiger Team said the contractor's explanations were unproven. The causes of the incidents "are not yet sufficiently understood to ensure that [the NSA] can expect to avoid these incidents in the future," their report said.
Write to Siobhan Gorman at siobhan.gorman@wsj.com

No comments:

Post a Comment