Extracting Key Investigative Data from the “Noise”
Over Ten Terabytes of Data did not overwhelm federal, local and state investigators.
What follows is a fascinating story involving advanced data access, tracking and retrieval technologies.
Boston probe’s big data use hints at the future
By Frank Konkel – Article Courtesy of: FCW
Less than 24 hours after two explosions killed three people and injured dozens more at the April 15 Boston Marathon, the Federal Bureau of Investigation had compiled 10 terabytes of data in hopes of finding needles in haystacks of information that might lead to the suspects.
The tensest part of the ongoing investigation – the death of one suspect and the capture of the second – concluded four days later in part because the FBI-led investigation analyzed mountains of cell phone tower call logs, text messages, social media data, photographs and video surveillance footage to quickly pinpoint the suspects.
A big assist in this investigation goes the public, which presented perhaps the best illustration of a crowd-sourced investigation in recent memory.
Not only did the public respond to the FBI’s request for information – the agency ultimately received several thousand tips and loads of additional photographs and video footage – but a citizen’s tip ultimately led to the capture of the surviving suspect.
Still, the investigation showed a glimpse of what big data and data analytics can do — and highlighted how far we yet have to go.
Knowledge is power
Big data is a relatively new term in technology and its definition varies amongst early practitioners, but the main goal of any big data project is to pull insights from large amounts of data.
Prominent statistician Nate Silver describes it as “pulling signal from the noise” – noise that can be a veritable smorgasbord of different kinds of information. The noise can be big, too – some datasets within the federal government are measured in petabytes, each of which is one million gigabytes or 1,000 terabytes.
So the 10 terabytes gathered by investigators is not a large data collection even in today’s relatively early stages of big data technology.
But the investigation’s processes still presented officials with a data crunch due to the volume, variety and complexity, according to Bradley Schreiber, vice president of Washington operations for the Applied Science Foundation for Homeland Security.
To get a sense for the initial complexities of combining various data sets in the early moments of the investigation, consider this: In the aftermath of the bombing, cellular networks in the area were taxed beyond their capacity. AT&T put out a tweet urging those in the area to “please use text & we ask that you keep non-emergency calls to a minimum.”
There was speculation that the bombs could have been triggered remotely by mobile phones, prompting interest in traffic logs from area cell towers to try to get a fix on the culprits.
That geo-location information could then be cross-checked against surveillance video and eyewitness photography – just another layer of data available to law enforcement when trying to stitch together a detailed and textured version of events.