Judgment Calls. Thomas H. Davenport
Читать онлайн книгу.of an ecology of decision-making redundancies, integrated tightly into an overall and well-orchestrated process of problem solving. Overlapping authorities and tasks increase the odds of exposing potential issues and uncertainties—they can't fall through the cracks if there are no cracks. This way of working, and the culture by which the entire process is facilitated, also gives early and ample opportunity for people to speak out when they see a problem. Mike Ryschkewitsch, NASA's chief engineer, says, “You know one of the things that NASA strongly emphasizes now is that any individual who works here, if they see something that doesn't look right, they have a responsibility to raise it, and they can raise it … for example, you have whole communities of experts throughout NASA whose whole life is about maximizing safety to the crew.”
In large part thanks to all that preliminary work, many FRRs are fairly routine. Problems have been identified, analyzed, and solved beforehand. Representatives of the teams that carried out that technical work present their results to the group as a whole and have the knowledge they need to answer the questions their colleagues may raise. The STS-119 review—which actually became a series of reviews—was unusual. The technical problem about the engine valve first noted was barely understood at the time of the first FRR and not resolved to the satisfaction of many participants at a second, marathon session. It took three meetings to arrive at “go” for launch. That “decision about the decision making” showed that the FRR is not a rubber stamp on a foregone conclusion; it demonstrated Ryschkewitsch's claim that people at NASA felt free to delay flights over technical concerns, putting flight safety ahead of schedule and productivity. Though the process of problem solving and decision making is well structured, the culture of dissent and open exchange balances and gives critical flexibility to what might otherwise be a dangerously rote activity.
The Problem of the Faulty Valve
The problem that faced the engineers and scientists who took part in the FRR for STS-119 came to light during the previous shuttle mission, STS-126. Shortly after that spacecraft, Endeavor, lifted off from Kennedy Space Center on November 14, 2008, flight controllers noticed an unexpected hydrogen flow increase related to one of the shuttle's three main engines. Because three control valves work together to maintain proper pressure in the hydrogen tank, the other valves compensated for the malfunction and the flight proceeded safely. But before another mission could fly, the shuttle team would need to understand why and how the problem occurred, whether it was likely to happen again, and just how dangerous a recurrence might be.
Bad weather in Florida forced Endeavor to land in California on November 30 and the shuttle was not returned to Kennedy until December 12, delaying examination of the faulty valve by almost two weeks. X-rays showed that a fragment of the valve's poppet (a tapered plug that moves up and down to regulate flow) had broken off. So the risks engineers had to consider included not only the kind of hydrogen flow anomaly they had seen on STS-126, but the possibility that a poppet fragment racing through propellant lines might rupture one of them. The level of risk depends on two factors: the likelihood of a problem happening and the seriousness of the consequences if it does. The consequences of a ruptured line would be disastrous, so the likelihood had to be extremely low to make the risk acceptable. The necessary technical analysis would have to have two major components: studying the valve to determine why the poppet broke, as a way of understanding the probability of a similar failure; and figuring out whether a poppet fragment was at all likely to breach the propellant system.
Because the valve is part of a system that included the shuttle, the main engines, and the external fuel tank, responsibility for understanding its failure lay with teams at the Johnson Space Center in Houston, the Marshall Space Flight Center in Huntsville, Alabama, and several NASA contractors, including a division of Boeing. They began work on these issues. The process proved challenging.
The first flight readiness review for STS-119 took place on February 3. It quickly became apparent that the technical teams did not yet understand the problems well enough to certify that the next shuttle spacecraft for this mission, Discovery, was ready to fly. Steve Altemus, director of engineering at Johnson Space Center, said, “We showed up at the first FRR and we're saying, ‘We don't have a clear understanding of the flow environment, so therefore we can't tell you what the likelihood of having this poppet piece come off will be. We have to get a better handle on the consequences of a particle release.’ ” The launch was rescheduled for February 22—overoptimistically, as it turned out—and the technical teams kept working.
They faced tricky problems. X-ray analysis had determined that the poppet failed because of high-cycle fatigue—that is, damage caused by repeated use. Unfortunately, these components were no longer manufactured and were in short supply, so the option of acquiring new, unstressed poppets did not exist. Given that fact, a reasonable approach could be to examine poppets for cracks that might indicate potential weakness; a poppet with no cracks seemed extremely unlikely to fail. But even electron microscopes could not reliably locate tiny cracks unless the poppets were polished first, and polishing subtly changed the hardware, invalidating its flight certification.
Trying to determine whether a poppet fragment might puncture a fuel line was made even more difficult because of the complexity of the fluid dynamics analysis necessary to determine the velocity, spin, and probable path of fragments of different sizes. The behavior of rapidly moving fluid is notoriously hard to predict. NASA's Glenn Research Center, Stennis Space Center, and the White Sands Test Facility began impact testing to simulate and try to understand the problem.
A second FRR was scheduled for February 20. Scott Johnson, chief safety officer for the shuttle, noted that “the majority of the safety community was concerned about the amount of open work in front of us. As a result, I recommended that we delay the FRR. We still had a lot of analysis work going on. We weren't really that close to being able to quantify the risks.”
The Marathon Flight Readiness Review
The review proceeded as scheduled, however. More than 150 people gathered in the cavernous Operations Support Building 2 at Kennedy Space Center: engineers and managers from NASA centers, international partners, contractors, consultants, and former employees—“graybeards,” in NASA parlance—whose practical wisdom was valued by the generations that followed them. The astronauts who were to fly on STS-119 were there as well. Bill Gerstenmaier, head of space operations, who facilitated the meeting, wrote afterward, “I worked with all the astronauts very closely … their kids went to school with my kids, and here they are in the very room where we are discussing their safety.” Mike Ryschkewitsch remembers, “The kind of pressure we were under says if we make a bad mistake, people die … These are our friends and acquaintances and we are saying to them, ‘This is good enough for you to fly.’ ”
Ed Hoffman, director of NASA's Academy of Program/Project and Engineering Leadership, which is responsible for the knowledge sharing and project management development throughout the agency, observed that the immensity of the building, the sense of purpose, and the muted but palpable anxiety present that day reminded him of a service in a cathedral.
The meeting lasted nearly fourteen hours, far longer than any earlier FRR. It was an indication of both continuing uncertainty about technical issues and the openness of management to full and free discussion.
Despite the tremendous amount of analysis and testing that had been done, technical presentations on the causes of the broken valve on STS-126 and the likelihood of recurrence were incomplete and inconclusive. Unlike at most FRRs, new data streamed in during the review and informed the conversation. A chart reporting margins of safety included TBD (to be determined) notations.
Doubts about some test data arose when Gene Grush, lead of the Johnson Space Center's Concept Analysis Team, received a phone call from NASA's Stennis Space Center informing him that the program to evaluate the danger of material broken off a poppet breaching a fuel line had used the wrong material. “I had to stand up in front of that huge room and say, ‘Well there's a little problem with our testing. Yes, we did very well, but the hardness of the particle wasn't as hard as it should have been.’ That was very critical because that means that your test is no longer conservative. You've got good results, but you