NextGov.com

Brought to you by GovernmentExecutive.com Events

Government Executive events provide federal managers with practical insights on current topics. They feature prominent thought leaders addressing key issues facing the federal government. Attendees gain the latest insights and best practices from their colleagues throughout government.

UPCOMING EVENTS

OCTOBER 21
Communication Today: New Generations, New Rules

OCTOBER 23
Developing Effective Records Management Strategies

What's happening in the federal IT community

A Serious Gamer for the Obama Team
by Allan Holmes - 11/19/08 5:09 pm EST

Feds May Want Vendors to Vouch
by Gautham Nagesh - 11/19/08 4:47 pm EST

CACI Hires Former FBI CIO
by Allan Holmes - 11/18/08 9:38 am EST

What If We're All the CTO?
by Anne Laurent - 11/17/08 3:32 pm EST

What should the CTO do?
by Gautham Nagesh - 11/14/08 9:48 am EST





















Project will preserve Bush administration Web sites
By Jill R. Aitoro, jaitoro@govexec.com   08/15/08

More than 100 million Web pages from President Bush's second term will be preserved for historians, researchers and the public, thanks to a joint effort announced on Thursday of government agencies and non-profit libraries.

Comment on this article in The Forum.The Library of Congress and Government Printing Office, in partnership with the California Digital Library, University of North Texas Libraries and Internet Archive, will harvest and archive all Web sites that could change under a new presidential administration. The total amount of data in the collection, which will focus on executive and legislative branch sites, is expected to reach 10 to 12 terabytes.

"These sites either change quickly — immediately following election — or change closer to the swearing in," said Kris Carpenter, director of the Web group at the nonprofit Internet Archive. "We want to preserve the most important information for future researchers."

For example, committees that are made up of presidential appointees and elected officials change with a new administration, so there's a need to preserve information about their members, areas of responsibility, policies and accomplishments. Some changes are significant and others are more subtle, Carpenter said, "but they all can be very telling for researchers looking back and asking, 'How did this influence specific actions of the current administration, and the administrations that followed?'"

Beyond content, researchers will be able to analyze how information was positioned on a Web page, what was placed alongside it, and the significance that could have had in the communication of the overall message.

The Library of Congress will focus on preservation of congressional Web sites, and the Internet Archive will conduct a comprehensive "crawl" of the .gov domain, essentially taking snapshots of all pertinent sites. The University of North Texas and California Digital llibraries will each conduct more in-depth crawls of specific government agencies, and the Government Printing Office will offer advice on the preservation process. Automated tools will assist in collection, though an inventory will be taken manually to ensure no information is missed.

"We're using technologies and processes that allow us to render the materials as they're presented to the user," Carpenter said. "This is critically important — we're not modifying them in any way." Once the project is complete, researchers and the public will be able to navigate the archived pages the same way they do other Web destinations: by typing the address, viewing the page and browsing through materials. The content will be indexed to enable full text searches.

Similar projects took place in 2000 and 2004, to document the Web pages of President Clinton's first term, and the first half of the Bush administration. The 2004 end-of-term collection has about 75 million addresses for Internet resources, known as Uniform Resource Identifiers, or URIs.

This project is larger in scope though, in part, because records have grown bigger. In 2004, the average government Web record was seven times larger than the average .com record. And the records have likely grown more over the past four years given increases in the number of data-rich files, such as images, .pdf documents, and videos. The Internet Archive conducts monthly harvests of several federal .gov sites and has seen a 15 percent increase in the collection size in the past two years alone.


E-MAIL THIS ARTICLE    SHARE THIS ARTICLE    PRINT THIS ARTICLE

VENDOR SOLUTIONS

Facing challenges for delivering applications quickly and securely?
Application Delivery Network Whitepaper brought to you by Blue Coat

The New Congress Briefing - Implications of the election outcomes
Offered by the Government Affairs Institute at Georgetown University

Kronos Solutions for the Federal Government brochure
Federal Contractor Accounting Whitepaper, brought to you by Kronos

BIM and Facilities Management Whitepaper
Free whitepaper, brought to you by Autodesk.

3 New White Papers from IBM
Transformational Government; Going Green; and ECM for Government Case Management

View more products and services... Purchase a link now...