170 BILLION tweets = Library Of Congress Twitter Archive Nears Finish, Remains Unusable [View all]
... it takes the system over 24 hours to execute a search of a single keyword.
Library Of Congress Twitter Archive Nears Finish, Remains Unusable
http://idealab.talkingpointsmemo.com/2013/01/library-of-congress-twitter-archive-nearly-done-just-unusable.php
Carl Franzen January 4, 2013, 12:16 PM 1597
Almost four years after the project was first announced, the Library of Congress on Friday announced that it expects by the end of January to finish a research archive of all the tweets publicly posted on Twitter since the service launched in 2006. The archive will remain unusable for the foreseeable future, however, due to technical challenges the agency said it encountered during the course of the project.
Specifically, the Library of Congress (LOC) wrote in a white paper (PDF) published online Friday that to date it has amassed an archive of 170 billion tweets and that is has almost completed its initial objectives which include creating a chronological archive of tweets between 2006 and 2010 in addition to a separate archive of every tweet since then.
This month, all those objectives will be completed, the LOCs white paper states.
But the LOC is still struggling with technology challenges to making the archive accessible to researchers and policymakers, specifically the fact that currently, with the archive of just all of the older tweets, it takes the system over 24 hours to execute a search of a single keyword. ..........