Big Data & the Boston Marathon Probe

Extracting Key Investigative Data from the “Noise”

Over Ten Terabytes of Data did not overwhelm federal, local and state investigators.  

What follows is a fascinating story involving advanced data access, tracking and retrieval technologies. 

FCW - The Business of Federal Technology

Boston probe’s big data use hints at the future

By Frank Konkel – Article Courtesy of:  FCW

The One Fund Boston

Donate to “The One Fund Boston 2013

Less than 24 hours after two explosions killed three people and injured dozens more at the April 15 Boston Marathon, the Federal Bureau of Investigation had compiled 10 terabytes of data in hopes of finding needles in haystacks of information that might lead to the suspects.

The tensest part of the ongoing investigation – the death of one suspect and the capture of the second – concluded four days later in part because the FBI-led investigation analyzed mountains of cell phone tower call logs, text messages, social media data, photographs and video surveillance footage to quickly pinpoint the suspects.

A big assist in this investigation goes the public, which presented perhaps the best illustration of a crowd-sourced investigation in recent memory.

Not only did the public respond to the FBI’s request for information – the agency ultimately received several thousand tips and loads of additional photographs and video footage – but a citizen’s tip ultimately led to the capture of the surviving suspect.

Still, the investigation showed a glimpse of what big data and data analytics can do — and highlighted how far we yet have to go.

Knowledge is power

Big data is a relatively new term in technology and its definition varies amongst early practitioners, but the main goal of any big data project is to pull insights from large amounts of data.

Prominent statistician Nate Silver describes it as “pulling signal from the noise” – noise that can be a veritable smorgasbord of different kinds of information. The noise can be big, too – some datasets within the federal government are measured in petabytes, each of which is one million gigabytes or 1,000 terabytes.

So the 10 terabytes gathered by investigators is not a large data collection even in today’s relatively early stages of big data technology.

But the investigation’s processes still presented officials with a data crunch due to the volume, variety and complexity, according to Bradley Schreiber, vice president of Washington operations for the Applied Science Foundation for Homeland Security.

To get a sense for the initial complexities of combining various data sets in the early moments of the investigation, consider this: In the aftermath of the bombing, cellular networks in the area were taxed beyond their capacity. AT&T put out a tweet urging those in the area to “please use text & we ask that you keep non-emergency calls to a minimum.”

There was speculation that the bombs could have been triggered remotely by mobile phones, prompting interest in traffic logs from area cell towers to try to get a fix on the culprits.

That geo-location information could then be cross-checked against surveillance video and eyewitness photography – just another layer of data available to law enforcement when trying to stitch together a detailed and textured version of events.

For the complete story and a GREAT READ… CLICK HERE

ATF Buying Huge Amounts of Big Data for Criminal Investigations

The ATF Wants ‘Massive’ Online Database to Find Out Who Your Friends Are

BY ROBERT BECKHUSEN  Article Courtesy of:  Wired.com

ATF Tactical - Investigative Database

ATF Academy – Investigative Database – Photo:  ATF

The ATF is looking to speed up its caseload with an automated database for searching individuals — like these cadets during a 2010 civilian training session — and discovering the relationships between them.

The ATF doesn’t just want a huge database to reveal everything about you with a few keywords. It wants one that can find out who you know. And it won’t even try to friend you on Facebook first.

According to a recent solicitation from the Bureau of Alcohol, Tobacco, Firearms and Explosives, the bureau is looking to buy a “massive online data repository system” for its Office of Strategic Intelligence and Information (OSII). The system is intended to operate for at least five years, and be able to process automated searches of individuals, and “find connection points between two or more individuals” by linking together “structured and unstructured data.”

Primarily, the ATF states it wants the database to speed-up criminal investigations.

Instead of requiring an analyst to manually search around for your personal information, the database should “obtain exact matches from partial source data searches” such as social security numbers (or even just a fragment of one), vehicle serial codes, age range, “phonetic name spelling,” or a general area where your address is located. Input that data, and out comes your identity, while the computer automatically establishes connections you have with others.

Many other specific requirements are also to be expected for a federal law enforcement agency: searching names, phone numbers, “nationwide utility data” and reverse phone searches. The data will then be collected to help out during investigations and provide “relevant information and intelligence products.” There’s no hint the database is to be used to track gun sales, which is a big part of the ATF’s job, as the bureau is prohibited by law from establishing a centralized electronic database for gun purchases.

It’s necessary to note, however, that the ATF already does most of these things. Tracking down your identity, financial data, and finding connections between you and your kinfolk — your relatives, friends and business associates — is what criminal investigations are all about. And the bureau’s intelligence analysts already use a number of databases to help piece this information together.

But hunting through them for information that’s relevant and timely is a mind-numbing and time-consuming job. “Many of these tasks are performed manually,” the solicitation states, “resulting in longer turnaround times on important information and intelligence research and analysis requests.”

The bureau wants this new system to do all that gathering and research automatically. Which sounds like a good deal, in theory, allowing federal investigators to more easily bust criminals during a hot case. It could potentially give the investigators a lot more information than your sense of privacy may be comfortable with, or information not strictly relevant to a case. At the same time, the ATF is widely perceived as a weak, stagnant and underfunded agency. Even if it has a database that can track you down and find out who your friends are, it won’t necessarily be able to apply that to tracing gun transactions due to Congressional restrictions.

If the agency finds a gun linked to a crime, and then traces the gun to someone who bought it from someone else, all of that work figuring out the who’s-who will still likely have to be done manually.

A follow-up document from the ATF clarifies a few things. The database will not “consolidate multiple databases” the ATF already has access to — like LexisNexis and Thomson Reuters. The bureau is seeking to buy an existing database system and not fund the development of a completely new one. And it has to be reliable and work all the time. That includes 24-hour tech support for agents pulling those coffee-fueled all-nighter investigations. It’s also not an anti-terrorism tool and isn’t intended to “quickly respond to problems, threats, etc.”

But putting the ATF’s problems with tracing guns aside, it could still help agents track you down a lot faster than they could before — along with finding out everything else about you.

BY ROBERT BECKHUSEN  – Article Courtesy of:  Wired.com

Read the Complete Article Here

##############################################################

ATF Seeks ‘Massive’ Database For Faster Investigations

Article Courtesy of:  Huffington Post

ATF Agents - Investigative Database

ATF Agents – Investigative Database – Photo: Agents Wife

The federal agency tasked with regulating firearms wants a new weapon in its investigative arsenal: Big Data.

The Bureau of Alcohol, Tobacco, Firearms and Explosives is seeking proposals for “a massive online data repository system” that could allow agents to make faster connections between suspects’ names, social security numbers, telephone numbers and utility bills, according to a request issued last month.

The ATF already uses such databases, but it analyzes the data largely by hand, “resulting in longer turnaround times on important information and intelligence research and analysis requests,” the agency said.

It’s also difficult for law enforcement to use the information effectively because it’s not connected in a single database, according to Mark Tanner, president of Law Enforcement and Intelligence Consulting, which helps tech companies meet the needs of federal agencies.

Computing power would dramatically reduce the amount of time it takes federal agents to link pieces of information on suspects, Tanner said.

Read Complete Article Here… 

Article Courtesy of:  Huffington Post

Facebook & Big Data Collide

Big Data Could Cripple Facebook

Article Courtesy of:  TechCrunchJON EVANS

Big Data - Investigative Database

So there’s this startup called SmogFarm, which does big-data sentiment analysis, “pulse of the planet” stuff. I spotted them last year, and now they’ve got an actual product with an actual business model up and running in private beta: KredStreet, “The Social Stock Trader Rankings,” which performs sentiment analysis on StockTwits data and a sampling of the Twitter firehose to determine traders’ overall bullish/bearish feeling. They also compare reality against past sentiment to score and rank traders based on their accuracy, which is more interesting.

It’s a first iteration, but it looks pretty nifty, and I like the idea of a ranking system wherein unknowns can leave high-profile loudmouths in their dust by virtue of simply being right more often. Even if I feel slightly uneasy when I imagine such a system being applied to, say, tech bloggers.

Actually being held accountable for what I’ve written in the past?  

Doesn’t that just seem terribly wrong?

And of course it’s early days yet for companies like SmogFarm/KredStreet, and sentiment analysis, and natural language processing (such as that which powered Summly), and Palantir-style data mining. Just imagine what they’ll be able to do in five years.

And when they turn all that big-iron, big-data searchlight power on, say, Facebook timelines… what won’t they be able to determine???

A few years ago the EFF discovered that something as simple as your browser settings make you a lot less anonymous online than you might believe. Last week a study found that “human mobility traces are highly unique,” and when polling allegedly anonymous cell-phone location data, “four spatio-temporal points are enough to uniquely identify 95% of the individuals.” Good software can mine a lot of meaning out of apparently sparse and empty data.

So just imagine what happens when next-generation language and image-processing software, and then the generation after that, and the generation after that, is unleashed on your Facebook timeline. It seems very plausible that all those innocuous things you say, and how you say them, and the pictures you post, and the games you play, will subtly and invisibly add up to a terrifyingly accurate portrait of you, including any and/or all of the things about yourself that you never actually wanted to make public.

What’s worse is that it will be ridiculously easy. Would-be employers won’t have to scroll through your Facebook timeline themselves, they’ll just need to point their profiling software in your direction and 30 seconds later read its high-confidence predictions of your work habits, neuroses, personal failures, emotional instabilities, attitude towards authorities, and sexual proclivities, all expertly extrapolated from the tapestry of subtle-to-invisible nuances accumulated from all of your photos, comments, Likes, upvotes, etc.; all individually meaningless, but collectively highly illuminating. Individual profiling is a huge business just waiting to be tapped by ethically challenged startups.

(This could be mitigated somewhat if you were to keep all your activity friends-only, of course; but even then, every app or distant acquaintance you’re connected to will be able to learn more about you than you ever intended. And it’s easy to envision employers requesting that you connect to them on Facebook as part of the job-application process, and filtering out those who refuse…)

I can imagine what that kind of profiling software would have said about me, early in my career: Hopeless bibliophile. Afflicted with incurable wanderlust. Doesn’t like being told what to do. Extremely chancy hire: likely to quit any job after six months to travel or try to write the Great Canadian Novel.

Which, er, would have been one thousand per cent true; but obviously I didn’t want my potential employers back then to know about it.

Read the complete article…