The Library of Congress will complete archiving every Twitter post from the social media company’s first four years online this month, according to a blog post published on Friday.
The Library is also developing a system to immediately archive new Tweets posted to the site, the post said.
“The Library’s focus now is on addressing the significant technology challenges to making the archive accessible to researchers in a comprehensive, useful way,” the Library’s director of communications Gayle Osterberg wrote.
The Library began the process of archiving hundreds of terabytes of data from the billions of Twitter posts in 2010. The database reached 133.2 terabytes of information in December. The Library was collecting nearly half a billion tweets each day as of October 2012, according to a white paper.
Osterberg emphasized the importance of the Twitter archive as part of the Library’s broader mission. Researchers have already contacted the Library asking if they can access the archive to study citizen journalism, public health trends and stock market activities, she said.
“As society turns to social media as a primary method of communication and creative expression, social media is supplementing, and in some cases supplanting, letters, journals, serial publications and other sources routinely collected by research libraries,” Osterberg wrote.