Parsing the Data and Ideology of the We Are 99% Tumblr
Posted on October 9, 2011 by Mike
One of the most fascinating things to come out of the current We Are 99%/Occupy Wall Street protests is the We Are 99% Tumblr. At the site, people hold up signs that explain their current circumstances, and it tells the story of a whole range of Americans struggling in the Lesser Depression. It is highly recommended.
*snip*
In order to get a slightly better empirical handle on this important tumblr, I created a script designed to read all of the pages and parse out the html text on the site. It doesn’t read the images (can anyone in the audience automate calls to an OCR?), just the html text. After collecting all the text on all the pages, the code then goes through it to try to find interesting points.
http://rortybomb.wordpress.com/2011/10/09/parsing-the-data-and-ideology-of-the-we-are-99-tumblr/More at the link.