The Library of Congress has automated its metadata tagging but wants to reintroduce humans to the process to ensure a level of accuracy and ethics.
The Library of Congress, the biggest physical repository of information in the world, has been digitizing its resources, expanding its digital library and developing automation tools to manage its collection. As those tools bear fruit, Library officials now want to reintroduce humans to the process.
The Library’s Digital Innovation Labs Section “has undertaken a range of programs aimed at maximizing the use of digital collections and supporting emerging research methods,” including using machine learning and crowdsourcing prototypes, according to a solicitation posted Tuesday to beta.SAM.gov. “Now, the Library of Congress seeks to build on these initial experiments to further examine models that expand access to digital collections by combining digitally-enabled human participation with computational methods, otherwise known as human-in-the-loop approaches.”
As the Library’s digital collection expands, it needs help properly tagging and verifying the metadata attached to the content. Machine learning tools have been plugging along at this task through the pilot programs, but Library officials want humans to help verify the work is being done correctly, as well as ethically.
Through the contract, the Library wants to procure “at least two experimental prototypes” using human-in-the-loop workflows to “model, test, and evaluate different ethical approaches to applying crowdsourcing and machine learning methods to Library digital collections that enhance collection usability, utility, discoverability and user engagement.”
Per the solicitation, at least one of the two prototypes will focus on improving metadata through crowdsourcing methods. As users “add feedback on the accuracy of the derived metadata,” those results will in turn be funneled back into the automation tools “to serve as training data for a machine learning algorithm that shall be applied to Library digital collections to create enhanced metadata.”
Library officials stressed the need to incorporate user-centered design in the process or risk alienating the volunteers and researchers contributing to the project. The solicitation pushes this concept beyond basic interface design and requires the contractors to consider “user motivations and needs in the workflow.”
“Workflows shall be adaptive to different use cases and user profiles, to ultimately facilitate meaningful and productive user interactions with the Library’s digital collections,” according to the statement of work.
Bids are due by noon August 5. The contract will run from September 1 through June 30, 2021.