The NSFNET Backbone Project, 1987 - 1995
NSFNET: A Partnership for High-Speed Networking
From the moment the award was announced in November of 1987, there was only one goal for the partnership: building a high-speed national network backbone service. No one had ever attempted a data networking project of this scale, with backbone, regionals, and campus networks forming such a large internetworking environment. Merit, under cooperative agreement to the NSF, had responsibility for design, engineering, management, and operation of the NSFNET backbone service, but the project required all of the partners to work together to respond to enormous technical and managerial challenges.
Operational in July of 1988, a scant eight months after the award, the new T1 backbone linked thirteen sites, transmitting 152 million packets per month at a speed of 1.5 Mbps. Merit, IBM and MCI also developed a state-of-the-art Network Operations Center (NOC), and staffed it to run 24 hours a day, seven days a week; it was one of the first of its kind on the Internet. By 1989, Merit was already planning for the upgrade of the NSFNET backbone service to T3 (45 Mbps). In such a dynamic environment, the partnership had to walk a delicate line between foresighted planning and on-the-go engineering and user support. The tremendous success of their efforts is testament to the dedication and talent of all the members of the project team.
Merit had promised to deliver the NSFNET backbone service by July 1, 1988. "No one thought that it would be up in such a short period of time, but we did it," remembers Jessica Yu, a member of Merit's Internet Engineering Group. "The Internet engineers, only a handful of us at the time, worked day and night to build the network. The work was intense, but we didn't see it as a burden. It was very exciting; we felt that we were making history with the project."
The first thing Merit needed to do internally was to pull together the group of people that could make it happen. In the mid-80s, "there were few people who knew about this kind of technology," says Hans- Werner Braun. Not only was it difficult to find people with the necessary background; it was made even more difficult due to the tight deadline for start-up of the backbone. Braun, who had been working on TCP/IP for years, says: "The NSFNET program educated a generation of networking people, directly as well as indirectly, by making all of this technology available."
After the Internet Engineering team at Merit was assembled--Braun, Bilal Chinoy, Elise Gerich, Sue Hares, Dave Katz, and Yu--they established contacts with their counterparts at IBM, MCI, and the regionals, relationships that fostered continuous communication among the technical team in much the same way as relationships between executives such as Doug Van Houweling, Eric Aupperle, Dick Liebhaber, Bob Mazza, and Al Weis had provided the contacts for the initial partnership.
In addition, a special engineering team was assembled by IBM and placed at the Merit offices in Ann Arbor. The group was led by Jack Drescher, then a Product Manager at IBM's Research Triangle Park Laboratory, and included Jim Sheridan and Rick Ueberroth. Drescher, working in Michigan, Walter Wiebe and Paul Bosco in Connecticut, and Jordan Becker in New York pulled together and managed the detailed plan of how the partners were going to get from February of 1988 to online production in July of 1988, and how the system would be improved after that. Later, in mid-1989, Harvey Fraser took over as Manager of the on-site team.
For the initial deployment of the NSFNET backbone service, there was a lot of talk about getting the machines to the sites and getting them configured. We thought we were going to have to send people around to each site," recalls Elise Gerich, now Manager of Internet Engineering and at the time site liaison at Merit for the regional networks. Instead, Drescher's IBM team and Merit came up with the notion of creating a depot or assembly line, in the manner of Henry Ford and his first automobiles, at Merit in Ann Arbor.
"We turned the third floor of the Computer Center into an assembly line and delivered all of the IBM RTs there. Jim Sheridan from IBM and Bilal Chinoy from Merit spent a couple of weeks testing all the machines and installing all of the software. We even sent out notes to the various regionals saying that if you want to come to Ann Arbor, we would be happy to put you up and you can build your own router. In the end we had the whole floor covered with parts and machines and boxes; it was a great way to deploy everything."
Staff from the regionals did, in fact, come to Ann Arbor to help in the assembly depot, where hardware and software for the backbone nodes (Nodal Switching Subsystems) were assembled and configured. Each of the thirteen NSSs was composed of nine IBM RT systems processors running the Berkeley UNIX operating system, linked by two token rings with an Ethernet interface to attached networks. The nodes provided packet switching (i.e., forwarding packets received from one link to another link to their destination), routing control (directing packets to follow a "route" along the links to their destination), and statistics gathering (for network traffic information) for the network. Redundancy was an important part of the design: if a component or an entire system failed, another RT would be available to take over for it.
Each NSS, then, was essentially a large packet switch for transmission of data between regional and midlevel networks and the NSFNET backbone service (see the accompanying DIAGRAM (ftp://nic.merit.edu/nsfnet/final.report/nss.html).) The new T1 NSFNET would link 13 sites: Merit, the NSF supercomputer centers plus the National Center for Atmospheric Research, and the following regional and midlevel networks: BARRNet, MIDNET, Westnet, NorthWestNet, SESQUINET, and SURAnet. NYSERNet and JVNCnet were also served by the backbone, because each was collocated at a supercomputer center.
IBM service personnel traveled to each midlevel and regional site to install the new systems, with remote support from MCI and Merit. "It was an amazing team effort," recalls Walter Wiebe, the IBM Academic Information Systems networking senior manager responsible for the NSFNET program, including hardware, software, network engineering, and field support. "Everybody worked together extremely well-designing, integrating, testing, and documenting the hardware and software, and configuring 150 systems that integrated thousands of parts."
Even with the Merit "depot," however, there were occasional snags. "While shipping a node to Rice University for the SESQUINET connection, it turned out that they didn't have a delivery dock. The truck driver got there, took the equipment off the truck, left it in the driveway, and told the SESQUINET staff, 'Well, you're going to have to get it inside,' up the stairs and through a hallway," Gerich recalls with amusement. "There were seven RTs and three racks, so it weighed a lot." Making sure the truck driver stayed on hand, SESQUINET contacted Gerich, who made a few telephone calls to ensure that the driver got instructions for inside delivery. "It was so funny, that he was just going to abandon it outside the building," says Gerich, illustrating just how new and unfamiliar building a national backbone network on that scale was to many parts of the research and education community.
For the new Network Operations Center, the same third floor of the Computing Center at the University of Michigan that had been used to assemble the nodes was again transformed, this time into a showcase for the "nerve center" of the backbone. The center of the NOC was a hexagon-shaped room with large monitors showing the status of each NSS, the statewide network, and the NSFNET links. Through a large window on one hallway one could see the machine room with the Ann Arbor NSS as well as the two IBM 4831 mainframes. These machines handled network management, statistics collection and also acted as a server for NSFNET information such as documentation and data on network growth and performance, usage of the backbone, and distribution of networks.
Soon after the NOC was constructed, it was staffed to support the NSFNET backbone service twenty-four hours a day, seven days a week with an increase from four to eighteen people. Besides the NSFNET backbone, the NOC was also responsible for network operations for the University of Michigan's campus network, MichNet, and CICNet, a regional network linking the Big Ten schools. "We were funded to build a commercial NOC long before anybody else, so we were able to provide levels of service that astounded the funds-poor community at the time," says NOC manager Dale Johnson. As the backbone was developed, operational issues, including troubleshooting functions, moved out of the Internet Engineering group at Merit into the NOC.
Inventing a nationwide backbone network service from scratch, in the context of increasing use and constant change, meant not only that new operating procedures had to be devised, but if something wasn't working, a solution had to be found immediately. NOC operators often called the Internet Engineering staff in the middle of the night to fix problems.
The NOC team developed an automated trouble-ticket system for use among the partners, including the escalation procedures adapted from MCI; all of these procedures were eventually collected into an NSFNET Site Manual for use at other network NOCs. The team also created methods of gathering network traffic statistics and forwarding them into reports which were made publicly available on nic.merit.edu, Merit's host system for information services. In addition, network management, it was realized, was thought of a bit differently by IBM, MCI and Merit; all of the partners had to exchange information and learn from one another's experiences. Elise Gerich gives an example:
"We were working hand-in-hand with Bill Spano, Ken Zoszack, and others from MCI who'd never dealt with data networking; they'd only had experience with voice networks. We'd call them up and say, 'look, we're seeing that you guys are dropping packets,' they would run tests on their telephone lines and say that the traffic met all of their criteria. We'd reply that we were only getting 50% of our packets through, and they would say, 'The line looks good to us.' So it was an education process: they were learning how our tools could show there was something underneath in the physical layer that wasn't working. MCI had never looked at their circuits that way before."
Quick access to the IBM team in Ann Arbor and the MCI staff at its offices in Southfield, Michigan, was a boon to Merit. Jack Drescher says "many were the times when Hans-Werner and Bilal would hustle down the hall to my office and say 'We think we have this problem ...' or 'Did you know about this problem?' We'd sit down and jointly work out an action plan--it was great teamwork."
Software tools detailed in the proposal for network management and routing had to be revised and adapted to the new environment. "As it turned out, some of the technologies we expected to work did not," explains Paul Bosco.
"Instead of being interoperable, various TCP/IP implementations failed. At times Macintoshes couldn't talk to Sun workstations, IBM machines couldn't talk to DEC machines, and nothing could talk to PCs. But it turned out that many of the problems were related to the deployment of TCP/IP stacks on 'real' wide-area networks for the first time."
To provide a place to explore some of these problems as well as develop new solutions, early on in the first year of the project the team members deployed a test network between Merit in Ann Arbor; IBM in Milford, CT and Yorktown Heights, NY; and MCI at Reston, VA for research, development and testing. "MCI does what many large corporations do: we perform extensive testing on new equipment and software before we put it in a production environment. What NSFNET demanded was a bit more flexibility," explains Mathew Dovens, recalling how the team members pulled together when problems occurred in the network. Bob Mazza looks back on the partners' decision to build the test network as instrumental to the success of the NSFNET backbone service. "The power of the test net was that it ran in parallel with the backbone. Being able to test changes before deploying them on the production network gave the team a real edge in keeping up with all the new technical requirements."
Once the hardware, software, and circuits were in place at each of the backbone nodes, Merit began the complicated process of integrating all the components, debugging code, and configuring the links between each site, so that traffic could begin to traverse the new backbone. Merit also began to work with regional engineers to prepare for the cutover to the new T1 NSFNET backbone service. Jessica Yu remembers that "the challenge was to get the regionals to move to the backbone."
"The network was so new, and we put it together in such a short time, that the midlevel networks were reluctant to move their traffic onto it. We decided that the first network we'd move would be MichNet, then called the Merit Network. When the regionals saw that the new service was solid, they cut over to it one after another."
Merit and IBM's careful routing design and engineering work, led by Yu and IBM's Yakov Rekhter, along with close collaboration with network staff at the regionals, helped ensure a smooth transition to the T1 NSFNET. The cutovers were scheduled for late nights or early mornings, to keep service disruptions to a minimum. "It was a lot of work," according to Yu, "but it helped us build a trusting working relationship with the regional network engineers that was essential for network operations, and continues today."
In addition, Merit began to publicize the impending changeover to the new backbone. "There were things that needed to be done to support the sites that were coming up, and to make people aware of it--not just create a technical showcase, but make sure it was showcased," says Ellen Hoffman. Visitors to the new NOC received full tours; Merit also produced slide shows and information packets for the many presentations to be made about the NSFNET over the coming months and years. Merit's publishing activities ranged from engineering and site liaison group working papers and technical documents, to a biweekly newsletter called the Link Letter, which went to the regional and midlevel networks and other interested readers.
In July of 1988, the new NSFNET backbone service, with over 170 networks connected, went online. For the first time, the research and education community could access a high-speed, high-performance communications and data network backbone service, and they responded in the most direct way possible: within a month, network traffic doubled on the NSFNET backbone. During the first month of operation, traffic from the 56 Kbps NSFNET was phased in gradually in order to monitor the transition. In a letter to Eric Aupperle from Guy Almes, at the time chairman of the Federation of American Research and Education Networks (FARNET), the NSFNET team was praised for its achievement:
"We have seldom seen a major networking project come off so smoothly, and never one of such a magnitude. The hardware was new, the software configuration was complex and innovative in a number of ways, the line speeds were an order of magnitude faster than prior NSFNET speeds, and the loads borne during the first month of operation were heavy. Despite these factors, the NSFNET backbone came up on schedule with high performance and reliability even during the first weeks of use."
The NSFNET partnership, which had put in unbelievably long hours in order to meet the July start-up date, had performed beyond anyone's expectations, including perhaps even their own. By July 24, 1988 the old NSFNET backbone service, with its routers showing signs of endless code-tweaking and weary circuitry, was honorably decommissioned. The new NSFNET backbone service had arrived (see the accompanying MAP (ftp://nic.merit.edu/nsfnet/final.report/t1phys.html).)
Next: Transition to T3