Should the analysis be?

As it happened. Picture via Salford University (Flickr). CC-BY.
One of the ongoing developments in the coverage of the Boston Marathon bombing are constant calls for amateur photos and video of the attack. The FBI has suggested that the face of the bomber/s could be on someone’s camera right now. There were probably thousands of cameras that recorded the attack from multiple perspectives.
You can envision using this footage to create a 3D, high def model of the entire block at the exact moment of the attack–and for several hours before. The model would get better and better as the day went on, more cameras appeared at the finish line, and people began shooting video as well as stills. Even if the model is never created in the real, technological world (though it should be) it will begin to emerge in the minute-by-minute story of the the attacks, in a way that I think is very new. Assuming amateurs and CCTV provide enough video, and the video is correctly organized, is no reason why, in the immediate future, an investigator couldn’t walk the streets exactly they appeared just before, during and after the attack. In fact it is probably already possible to achieve this, though the results would be kind of artificial and might be challenged in court.
Of course, that assumes a lot.
Video analysis is a really big problem. You simply can’t ask a computer (at least, not yet) to look for the most important frame in a video. You might be able to get a computer to follow the string “look for a human being placing a backpack in a trash can,” using the same principles as optical character recognition (OCR) or facial recognition. You’d have to tell the computer what a trash can is though, and a backpack. Alternatively, you might be able to do something with motion tracking: go back to the scene and make several videos of an actor placing an object about the size and weight of the bomb into a can. Then feed the computer all the video with correct geographic markers, and tell it to look for a similar movement. You’d still get several (hundred?) hours of people putting heavy things in cans, but you might be able to get the person who placed the bomb.
Even that would be pretty difficult, time-consuming and expensive, involving (as far as I know) the invention of several new processes. Also, it might not work. Why reinvent the wheel when any patient and thorough person could do the same job as well or better than any computer? What if the critical piece of evidence isn’t who placed the bombs (though that is obviously important), but some other incident, which might be obvious to a human, but obscure to a computer?
Of course here we run in to a different problem: it takes about as long to analyze a video or audio clip as it did to make it. There may be ways to shorten the analytical process, but a one hour video takes one hour to watch. Fast forwarding it distorts it. Sure, you can skip a few seconds at at a time–but it would only take a few seconds for the perpetrator to place the bomb.
If you really want to wring the evidence out of those videos, you need to watch every minute. The more footage there is, the more certain we can be that the evidence is there. But each new hour of video is going to include at least 59 minutes of dross. The problem is, you can’t determine where the critical minute will fall without watching all the minutes. Is that the best use of a forensic video analyst’s time right now? How many of them are there?
That’s why I think it would be more productive to find a way to splice all the videos together than to ask a computer to look for incriminating movements. Allowing an investigator to watch several similar videos in combination would be one way to streamline the analytical process. Most cell phone cameras already include pretty accurate geographical data, so what you really need is a great map of the area, and to sync up the clocks on every video. The guys who make 3D movies can show you how to combine all of these videos into one image for each square foot of the event. This would also be expensive but kind of doable.
But here’s a question: why should the analysis be restricted just to trained investigators? Finding the one good data point in the pile of dross is exactly what crowdsourcing is for. Recently over 100,000 volunteers proofread 20,000 public domain books. While books not as emotionally troubling as footage from the attacks, if we’re trusting the crowd with our intellectual heritage, why not ask them to take a first pass at some of these hundreds or thousands of hours of video? I can think of some solid objections to the idea, but I can also thing of some ways crowdsourcing could make it easier to organize this evidence.



