The ultimate goal is to build a searchable online index for the massive collection.
The Library of Congress has embarked on a project to digitize its pre-1978 copyright records, a solicitation reveals. It is enlisting partners to help it scan the catalog that lists its massive holdings, the first step towards building an online repository of the collection.
Copyright records dated after 1978 can be found on copyright.gov, but a trove of pre-1978 records haven’t yet been made available on the site.
An expansive catalog lists the library’s copyright collection that pre-dates 1978. That includes 16.4 million original and renewal registrations, 350,000 assignments, transfers and terminations of copyright ownership for 1.7 million works. The agency’s ultimate goal is to scan the various index cards that make up this catalog and produce machine-readable text from them to build a searchable online index. “This will require significant time and money to achieve,” the solicitation notes.
So before it jumps on that labor-intensive project, the Library of Congress wants to scan its catalog cards and upload them as images on the Internet so researchers can still search the collection. It’s casting the net to see what kind of software in the marketplace or under development could get this done, according to a solicitation.
The Library of Congress started its digital preservation program in 2000 and has 10,000 sites of digital archives, according to the Economist.