Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Pholus

(4,062 posts)
39. The problem is doable...
Mon Aug 12, 2013, 11:22 AM
Aug 2013

Here's the size of the problem. I will use the numbers from the guy who runs the "Internet Archive" who has a bit of knowledge about storing lots of data. He estimates starts by estimating 315 million US Citizens at 300 minutes per month each.

http://blog.archive.org/2013/06/15/cost-to-store-all-us-phonecalls-made-in-a-year-in-cloud-storage-so-it-could-be-datamined/

In raw data, he arrives at 272 petabytes of talk data per year in the US at a very generous compression rate. He even says it is reasonable to store a year of this data "in the cloud" for about 30 million dollars or so in a server room less than 10% the size of the Utah Data Center.

What could you do with that 272 petabytes to make it usable? All Snowden's stuff implies keyword based searches. DARPA has been VERY interested in speech to text since way back in 2002 when it was part of Bush's TIA so it is probably reasonable that this is how they work. Let's presume they add a keyword option to analyst searches with the option to go back to review the original audio.

I have no idea about how text-to-speech works, but you have to process the audio stream. Just for convenience I'll claim that it is as computationally hard as turning an avi file into an mp3 file. It can't be much harder, my idiot cell phone can do a tolerable job in a tiny portable processor in real time.

How big a computer would you need to process 272 petabytes of data in a year. Tom's Hardware benchmarks computers in part by converting a 178 MB wav file into an mp3. A mid-range modern processor does it in about a minute and a half. A single one of these computers can convert all 272 petabytes in 2.5 billion minutes. There are half a million minutes in a year, so 5000 processors could do the job in one year.

That's about half the size of one of the "top 500" clusters in existence. Surely the NSA can afford ONE of the 500 most powerful computers in the world, right? And a hell of a bunch of people each individually smarter than me to code for it and manage it.

But can they handle that much data?

The NSA has just admitted in their white paper that they "touch" (read as hoover up) 1.6% of 1826 Petabytes per day or 29 petabytes per DAY when it comes to email. Phone calls would seem to be easier than that, coming to just a bit less than 745 Terabytes per day. It's easier to do phone than the internet stuff.

Just remember, to find the needle in the haystack, you need a haystack. What was General Alexander's nickname again? Oh yeah, "Collect it all." Collection is easy, analysis is hard. That's why this system has no high profile successes -- they're still learning to use it.

Recommendations

0 members have recommended this reply (displayed in chronological order):

lol - the Privacy Pirates require haystacks (our content) to do their jobs! usGovOwesUs3Trillion Aug 2013 #1
Incorrect......... George II Aug 2013 #67
Thanks for setting the record straight usGovOwesUs3Trillion Aug 2013 #69
Who has access to the haystack? el_bryanto Aug 2013 #2
Imagine what they'd do with that blogger's medical records. JoePhilly Aug 2013 #6
Please do not conflate HHS with the NSA usGovOwesUs3Trillion Aug 2013 #8
Same evil government would control both. No difference. JoePhilly Aug 2013 #11
A well regulated gov of laws, is nothing to fear usGovOwesUs3Trillion Aug 2013 #15
So a totalitarian government segments those things? JoePhilly Aug 2013 #21
Like I said a gov that follows the rule of law is nothing to Fear usGovOwesUs3Trillion Aug 2013 #26
Ok, so no single payer until when, exactly? JoePhilly Aug 2013 #57
That's a whole nother problem to do with our gov being owned by the monied elite usGovOwesUs3Trillion Aug 2013 #60
"an out of control, nest of totalitarians who willfully ignore regulations, laws, and............. George II Aug 2013 #68
I'm sure it does to some usGovOwesUs3Trillion Aug 2013 #70
Regulations are only as good as the people enforcing them. TxGrandpa Aug 2013 #29
Exactly usGovOwesUs3Trillion Aug 2013 #54
Has that happened? Will it happen? MineralMan Aug 2013 #7
According to documented TOP SECRET evidence, AND numerous usGovOwesUs3Trillion Aug 2013 #12
Show me, don't tell me. MineralMan Aug 2013 #14
It's in all the newspapers, use google, but here's a LINK to a well organized collection of document usGovOwesUs3Trillion Aug 2013 #20
K&R liberal N proud Aug 2013 #3
+1 nt COLGATE4 Aug 2013 #5
Without reading the whole piece... I have to agree that... RevStPatrick Aug 2013 #4
I tend to agree with you about privatization. Just Saying Aug 2013 #16
+1000 RC Aug 2013 #19
Check out the writer of article: Andrew Liepman KoKo Aug 2013 #9
Yes I wrote this in my OP. Just Saying Aug 2013 #17
I can see why people who are into kiddie porn or terrorists Whisp Aug 2013 #10
Yawn, one part "if you have nothing to hide" added to "won't someone think of the children" Pholus Aug 2013 #32
touched a nerve, did I? Whisp Aug 2013 #33
No, but you just proved that you're a bottom feeder, rhetorically speaking. Pholus Aug 2013 #42
I don't know you from Adam's last dump Whisp Aug 2013 #48
Touched a nerve, did I? Pholus Aug 2013 #49
phffft. You are too sensitive for politics forums looks like. n/t Whisp Aug 2013 #55
And you are too lowbrow. nt Pholus Aug 2013 #56
Next up: just drug dealers, kiddy porn and terrorists. Warren Stupidity Aug 2013 #93
It's not all about you usGovOwesUs3Trillion Aug 2013 #53
Personally, Jamaal510 Aug 2013 #94
"...opinion piece from former CIA and deputy director truebluegreen Aug 2013 #13
I called Tech Support the other day .... Scuba Aug 2013 #18
I'm sure the NSA has a SPAM filter, which reduces e-mail by 90%. reformist2 Aug 2013 #22
Filter your spam, yes. RC Aug 2013 #24
Strangely, no mention about passing the information to the DEA in there... Pholus Aug 2013 #23
I think we are only a short time away from being told that yes, they store audio, too. djean111 Aug 2013 #25
I'm not an expert on technology Just Saying Aug 2013 #28
Oh yes, the government never wastes time and resources. tinrobot Aug 2013 #36
The problem is doable... Pholus Aug 2013 #39
Good grief. What he is saying is trust your government. They will only do good. rhett o rick Aug 2013 #27
I think it is obvious now that our governement is FULL of blackmailable people KurtNYC Aug 2013 #34
Yes - this is a more realistic issue. BumRushDaShow Aug 2013 #44
even just with meta-data, they can, say, see who a political candidate is talking to. unblock Aug 2013 #30
What you said-- marions ghost Aug 2013 #61
and be good and don't join Greenpeace or OWS G_j Aug 2013 #31
Hell, you don't even have to join the movement NuclearDem Aug 2013 #35
What does it mean for a podcast to be under surveillance? (nt) Recursion Aug 2013 #40
Probably keeping tabs on the hosts and the show's audience NuclearDem Aug 2013 #46
Huh? Who hired them, and how do you know that? (nt) Recursion Aug 2013 #47
Excellent read, and it took guts to post it here....hope you ducked when you hit "Post my reply!"!! George II Aug 2013 #37
The opinion piece again misses the point. blackspade Aug 2013 #38
Nope Recursion Aug 2013 #41
And why is that not protected? blackspade Aug 2013 #50
I don't have an expectation of privacy in information I give to a third party in the course of Recursion Aug 2013 #51
Interesting. blackspade Aug 2013 #59
What about to your doctor, banker, lawyer, etc. usGovOwesUs3Trillion Aug 2013 #71
OK, to take up your analogy, you call a cab to get to the doctor Recursion Aug 2013 #73
Let's taken one at a time usGovOwesUs3Trillion Aug 2013 #74
I did. If you tell a third party you're talking to your doctor, banker, or lawyer Recursion Aug 2013 #75
Oh, so once the doc digitizes your med records the usGovOwesUs3Trillion Aug 2013 #77
No. You're still missing the concept Recursion Aug 2013 #85
You are, fuck the analogies then usGovOwesUs3Trillion Aug 2013 #86
former CIA, works for right-wing think tank.. frylock Aug 2013 #43
So it *is* good to shoot the messenger, now? Recursion Aug 2013 #52
"I'm confused." frylock Aug 2013 #58
Yes, you are confused Downtown Hound Aug 2013 #65
You don't know me well enough to determine my intent. Just Saying Aug 2013 #76
So you posted an aritcle stating that everything was hunky dory with NSA spying because...what? Downtown Hound Aug 2013 #78
Not that I owe you an explanation Just Saying Aug 2013 #79
Gee, wasn't I just waaaaay off? Downtown Hound Aug 2013 #80
Actually you were. Just Saying Aug 2013 #81
Not really. You posted some NSA apologist drivel Downtown Hound Aug 2013 #83
I'll post whatever I like and then watch Just Saying Aug 2013 #88
That's how it comes off... usGovOwesUs3Trillion Aug 2013 #84
GOLDSTEIN!!! backscatter712 Aug 2013 #45
What utter trash. woo me with science Aug 2013 #62
Okay, let me spell it out for you Downtown Hound Aug 2013 #63
Democrats have always been Cryptoad Aug 2013 #64
K & R Scurrilous Aug 2013 #66
The haystack is composed of "keywords"... kentuck Aug 2013 #72
They are doing this for emails. Just Saying Aug 2013 #82
Oh usGovOwesUs3Trillion Aug 2013 #87
More assumptions? Just Saying Aug 2013 #89
Exactly usGovOwesUs3Trillion Aug 2013 #90
I know you think you're witty and clever Just Saying Aug 2013 #91
Bullshit. wtmusic Aug 2013 #92
Latest Discussions»General Discussion»What did Edward Snowden g...»Reply #39