General Discussion
In reply to the discussion: Do you really think the NSA has the manpower to sift through billions of keystrokes and metadata [View all]alc
(1,151 posts)There are (at least) 2 ways it is done
1) they have a lot of heuristics (rules) that help them identify likely communications of interest. 'hits' from the first level of rules may go through more heuristics or go to a human. There will be false hits and misses. To minimize misses, they increase the number of false hits. These need more resources and lead nowhere. That's one of the problems with this type of system - lots of 'wasted' resources which could be spent on targeted investigations. And very inconvenient for targets of the false hits - all 600,000 people on the no-fly list are not actually terrorists, but did match a heuristic somewhere. Would you like to be added because a suspected terrorist was in your city and your phone happened to be near his a few times in one week (maybe you both like starbucks before work)? And that "suspected terrorist" may not even belong on the list either but took some business trips that "looked fishy" to the heuristics.
A company I worked for in the past actually worked with the NSA on these rules and visualization systems (many companies and universities and government agencies work together on this type of thing - big data analysis). My company purely for marketing (e.g. who should get a coupon and what value/type) and optimization (e.g. analyze product returns) while the NSA for obvious reasons. Our 'wasted' resources was coupons that were not redeemed or didn't result in long term consumers. The cost was pretty significant but still better than blasting an entire zip code with coupons. Other companies send out personalized coupon books that don't appear personalized. The cost is very high and most aren't used, but the up-sell and cross-sell results are pretty amazing. And it's done without any humans involved and significantly less computer resources than the NSA has.
We are talking about 50+ million consumers and 100s of millions of items of marketing data each run. Walmart crunches billions of sales records a day looking for all sorts of patterns (customers, inventory, distribution issues, sales effectiveness, pricing, etc). Billions of records is nothing for this type of analysis whether you're trying to identify a small number of individuals or find bigger patterns (e.g. companies have found pretty cool patterns around running out of products on shelves and how to avoid it and sometimes without even searching for those but looking at visualizations of the data).
2) They identify someone or a group (e.g. known terrorists or senators who will be voting on an NSA oversight bill). Then they have the computers pull out all metadata involving those people. If there isn't enough data, they widen it (i.e. include families or staff). Then they have computers cross-reference that metadata with other metadata. If there are only a few dozen (or 100) individuals, this may be human-guided. For example
"show me every time Mr X's cell phone was within 100 yards of a suspected felon".
No hits there so try "200 yards", or "prostitute".
No hits there so look for daily patterns and days that didn't follow the pattern.
Look for every other phone that was within 100 yards of this phone more than 5 times over the last month (excluding work and home). Anyone look suspicious? Spend some time looking at their metadata.
Then look for times the phone was at work/home while the car wasn't (from all of the license plate cameras).
Then look for other phones with very similar location patterns since this person may have an official and off-the-shelf untraceable phone.
You get the idea. If you're focusing on a small number of individuals but have a HUGE amount of data about everyone you can find a lot. In the case of terrorists this is good. But, since there are other potential uses they should have to start with the terrorists and expand from there by getting warrants to collect more data rather than having all of the data available and being able to navigate through it.