Maintaining a sense of history

Agencies told to preserve Clinton-era Web sites for posterity

As the old administration makes way for the new, what happens to information posted online? Just as reports, correspondence and hearing transcripts are regularly archived to provide future generations insight into the workings of contemporary government, Web sites also are being preserved for posterity.

The Government Printing Office is adding the Web site of the National Partnership for Reinventing Government (NPR) to its growing collection of Web sites operated by now-defunct federal commissions and task forces.

And the National Archives and Records Administration asked all federal agencies to take "snapshots" of their public Web sites on or before Jan. 20 for a record of the sites as they appeared at the end of the Clinton administration.

NARA's plan, announced Jan. 12, could require agencies to make copies of 25,000 Web sites, according to federal Webmasters, who complained the Archives asked for far too much in far too little time.

The project "is not physically doable" in the eight days between the request and George W. Bush's Jan. 20 inauguration, one Webmaster said.

Deputy U.S. Archivist Lewis Bellardo said preserving a freeze frame of each agency Web site would give future researchers valuable insight into how federal agencies perceived their role in government and their responsibility to society. "This is the first administration where the Web has become a major instrument for transacting business and getting information from the government," he said.

Traditionally, agency records and history are preserved on paper. But increasingly, valuable records exist only as electronic documents on agency Web sites. How best to preserve Web sites remains a technological and practical challenge. "NARA does not have the capability at this time to take or preserve all of the types of agency Web records," Bellardo said.

The recordkeeping agency is working with the San Diego Supercomputer Center and archival organizations to develop methods for archiving Web sites. But preserving Web "snapshots" is much simpler. It requires only "a low-level knowledge of file editing tools and Web management skills," according to NARA. Agencies were instructed to record the snapshots on CD-ROMs, tape cartridges or 9-track tapes and deliver them to NARA headquarters in College Park, Md.

GPO uses a relatively simple process to preserve the old sites it maintains. With software, GPO "harvests" site content and saves it on disks, said Gil Baldwin, director of GPO's Library Programs Service.

"It took us about 20 minutes to download everything from the NPR site," Baldwin said.

Although NPR dissolved at the end of the Clinton administration, its Web site will live "indefinitely" in the Government Documents Department of the University of North Texas Libraries. The university works with GPO to host defunct government Web sites.

The preserved sites appear the way they did when they were active, but hotlinks to other sites are not kept up-to-date. "That's a problem everyone is grappling with," but it remains unsolved, Baldwin said. Unlike paper documents, "you can't just grab the stuff and forget about it," Baldwin said. As new Web formats are developed, old content will have to be "migrated" to the new formats so it can continue to be accessed by Internet users.

Many defunct Web sites are maintained as a reminder of the past. A Web site once run by the U.S. Office of Consumer Affairs attempts to calm fears about the approach of the Year 2000.

The Office of Consumer Affairs disbanded in 1999, before its members could learn whether automobiles would erupt in flames on New Year's Day 2000. On another site, the National Civil Aviation Review Commission warns of gridlock in the sky "shortly after the turn of the century." The commission expired in 1997, but its Web site featuring the report, "Washington, We have a Problem," is preserved as a bit of electronic history by the University of North Texas.

Taking a site 'snapshot'

1. The snapshot should not be a backup of the system, but a copy in

a format that can be read on other platforms.

2. Identify all external links and insert a message saying they have

been disconnected.

3. Include all documents that are available to the public on an agency

site. Exclude documents located on external servers to which the Web site


4. Copy the files to CD-ROM, cassette tape or 9-track tape.

5. Package with technical documentation to identify, service and interpret

Web site files.

Source: National Archives and Records Administration

NEXT STORY: State debuting dynamic site