The evidence will be crowdsourced

Should the analysis be?

As it happened. Picture via  Salford University (Flickr). CC-BY.

As it happened. Picture via Salford University (Flickr). CC-BY.

One of the ongoing developments in the coverage of the Boston Marathon bombing are constant calls for amateur photos and video of the attack. The FBI has suggested that the face of the bomber/s could be on someone’s camera right now. There were probably thousands of cameras that recorded the attack from multiple perspectives.

You can envision using this footage to create a 3D, high def model of the entire block at the exact moment of the attack–and for several hours before. The model would get better and better as the day went on, more cameras appeared at the finish line, and people began shooting video as well as stills. Even if the model is never created in the real, technological world (though it should be) it will begin to emerge in the minute-by-minute story of the the attacks, in a way that I think is very new. Assuming amateurs and CCTV provide enough video, and the video is correctly organized, is no reason why, in the immediate future, an investigator couldn’t walk the streets exactly they appeared just before, during and after the attack. In fact it is probably already possible to achieve this, though the results would be kind of artificial and might be challenged in court.

Of course, that assumes a lot.

Video analysis is a really big problem. You simply can’t ask a computer (at least, not yet) to look for the most important frame in a video. You might be able to get a computer to follow the string “look for a human being placing a backpack in a trash can,” using the same principles as optical character recognition (OCR) or facial recognition. You’d have to tell the computer what a trash can is though, and a backpack. Alternatively, you might be able to do something with motion tracking: go back to the scene and make several videos of an actor placing an object about the size and weight of the bomb into a can. Then feed the computer all the video with correct geographic markers, and tell it to look for a similar movement. You’d still get several (hundred?) hours of people putting heavy things in cans, but you might be able to get the person who placed the bomb.

Even that would be pretty difficult, time-consuming and expensive, involving (as far as I know) the invention of several new processes. Also, it might not work. Why reinvent the wheel when any patient and thorough person could do the same job as well or better than any computer? What if the critical piece of evidence isn’t who placed the bombs (though that is obviously important), but some other incident, which might be obvious to a human, but obscure to a computer?

Of course here we run in to a different problem: it takes about as long to analyze a video or audio clip as it did to make it. There may be ways to shorten the analytical process, but a one hour video takes one hour to watch. Fast forwarding it distorts it. Sure, you can skip a few seconds at at a time–but it would only take a few seconds for the perpetrator to place the bomb.

If you really want to wring the evidence out of those videos, you need to watch every minute. The more footage there is, the more certain we can be that the evidence is there. But each new hour of video is going to include at least 59 minutes of dross. The problem is, you can’t determine where the critical minute will fall without watching all the minutes. Is that the best use of a forensic video analyst’s time right now? How many of them are there?

That’s why I think it would be more productive to find a way to splice all the videos together than to ask a computer to look for incriminating movements. Allowing an investigator to watch several similar videos in combination would be one way to streamline the analytical process. Most cell phone cameras already include pretty accurate geographical data, so what you really need is a great map of the area, and to sync up the clocks on every video. The guys who make 3D movies can show you how to combine all of these videos into one image for each square foot of the event. This would also be expensive but kind of doable.

But here’s a question: why should the analysis be restricted just to trained investigators? Finding the one good data point in the pile of dross is exactly what crowdsourcing is for. Recently over 100,000 volunteers proofread 20,000 public domain books. While books not as emotionally troubling as footage from the attacks, if we’re trusting the crowd with our intellectual heritage, why not ask them to take a first pass at some of these hundreds or thousands of hours of video? I can think of some solid objections to the idea, but I can also thing of some ways crowdsourcing could make it easier to organize this evidence.

Facebook asks you to catalog yourself

graphsearch

The answer is “none of the above”

From my January 19th post on the limitations of Graph Search:

Because here’s the thing: even if Facebook does come up with a novel way to deal with the cats/kittens/Cats/cat problem, it doesn’t matter, unless you’re feeding the correct information into Facebook’s search engine. How closely does your list of Facebook “likes” mirror your actual likes in real life? Probably about as closely as your list of Facebook friends mirrors your actual social circle. Possibly a lot less. Unless you plan to spend hours “Liking” what you like–or reporting every time you eat Thai food in Manhattan–the value of Graph Search is more in its ambition than its actual usefulness. Collecting enough information to make Graph Search useful would require either a great deal of surveillance or a lot of uncompensated effort on the part of Facebook users.

More >>>

Reality testing

A well-known Christian evangelical* was recently the target of an elaborate hoax suggesting that he had “resigned from Christianity” and was taking an anti-Obama, pro-environment, pro-Second Amendment argument to the political stage. The hoax included a web site with an address very similar to the evangelical’s real web site, as well as false Twitter and social media accounts. What was particularly interesting to me was that this hoax included a sort of inoculation against skepticism. The fauxvangelical claimed that his turnaround was being resisted by his followers, who supposedly refused to relinquish control of his web site. The hoax was convincing enough to take in a few of my friends, including some very well-educated and wise people–people who not only should know better, but often do. The hoax seems to have cleared up now. It’s just another five minutes of confusion on an Internet that’s full of such events. But it got me wondering: how do we know what really happened?

pink

OK, sometimes it’s real. Picture by Flickr user Rene Mensen. CC-BY.

Our sense of what’s real and not real, in the physical, face-to-face world, is often socially determined. If you “see” a pink elephant, you can guess that it’s a delusion or hallucination, because the people around you won’t see it. Most examples are not that extreme (and if you are seeing pink elephants, you should probably contact a doctor), but there are any number of small ways that the people around you assure you that you occupy the same reality as everyone else. Every time someone rolls their eyes at you (“There you go again!”), or tells you to chill out, they are helping you establish a reality baseline. And every time you ask someone “Did you see that?” or “Does that look right to you?” or “Do you remember when…” you are checking your own perceptions against objective reality, or at least a socially acceptable reality. There are other ways you can test and measure reality: common sense (“there are no such things as pink elephants, at least, not in Peoria”), logic, and physical cues about the world.

Setting aside our pink elephant for a moment, how do you know the difference between a cardboard cutout of a famous person, and the actual person? This may seem like a very easy question and it is: the cardboard one is cardboard. But how does a four-year-old know? Depending on the situation this may be a very difficult judgement for a four-year-old to make, which is why you can take four-year-olds to Disney World and convince them that they’ve met Snow White.

obama

And sometimes it isn’t. Picture by Flickr user willwhitedc. CC-BY.

It’s not until kids are six or seven–about the same time that they develop independent moral judgment and more complex abstract thinking–that they can reliably tell the difference between fantasy and reality. A lot of training from Mom and Dad, from books and school, from their peer group, and from their senses and experience go in to teaching a child that their dreams, stories, fantasies and fears are not the same thing as physical reality. It’s not just instinct. It’s training and socialization.

If you put someone into a different physical and cultural environment, they may once again have trouble picking up the cues, and may even lose their connection to reality for a brief period. For example, in many places in the world, ghosts and magic are quite real, at least to the people who live there. A very skeptical person raised in American scientific and technical culture might wonder how anybody could seriously believe in ghosts: the evidence is so clear! There are parts of Indonesia (where I once taught) where you can be laughed out of the room for not believing in them: the evidence is so clear! And the fact is that you will not fully understand Indonesian culture until you account for the ghosts. Even if you, personally, don’t believe in ghosts, there is some social imperative to act and interact as if ghosts are real, because you won’t understand people if you don’t. The ghosts of Southeast Asia have a social reality which overrides their “objective” reality.

The Internet is like a foreign country to people who didn’t grow up there. There are a lot of cues to tell the savvy Internet user that a website is legitimate: the address, the design, the type of information it requests, and so on. If you use the Internet a lot, it’s probably pretty easy to tell the real thing from a scam. But how does a person with no Internet experience know what a bank web site is supposed to look like? Or a news web site? A job application? The IRS? Facebook?

You can’t tell Grandma (or Junior) to only visit legitimate web sites if they don’t know what that means. Which is why you have to debug Grandma’s computer once a month. That would be fine, if that was as far as it went. The problem is when Dad asks you to help wire money to a stranger in Nigeria. Incidentally, this doesn’t have anything to do with intelligence or level of education. It doesn’t matter if Dad’s got a Ph.D. in economics–or, for that matter, a Nobel Prize. The problem is that he’s been left to fend for himself in the land of the hungry ghosts.

So how does he know? And how can you teach him, when the clues take years to learn and the fakes are increasingly sophisticated?

This is not a small question. Victims lost $9.3 billion dollars to Nigerian scammers (“419 scammers”) in 2010 alone. Some of them were probably in MENSA. Some were (no doubt) professionals, academics and financial whizzes. And for a lot of them, that was their retirement savings or rainy-day fund.

That’s just money and someone’s private tragedy. There are bigger things at stake here. What you should really be asking yourself is how do you know? I’m talking about the actual, logical steps you take to decide that one news source is more reliable than another, or that this web site really does belong to Bank of America. Maybe you don’t think about it much. But you should.

The system you have for sorting fiction from reality comes apart in the era of customized news and A/B testing. We are not far at all from an era where every public event is greeted with a flood of fake pictures, fake Twitter accounts, fake information, fake government bulletins, fake injuries and kidnappings, fake scientific journals, and more. We can add a coordinated attack on reality and evidence in the form of wild conspiracy theories. Finally we can add web sites that regularly change, disappear, reappear with modifications or reappear with a new design–conditions that don’t lend themselves to a sense of continuous reality.

You could, of course, turn to some kind of credentialing source to verify what’s real and what’s of quality…

…which (quite aside from the way it stinks of censorship) would make convincing credentials the most tempting (and profitable) racket of all.

I can’t offer an answer for this dilemma. I think it demands a lot more attention though. I think it’s very serious, especially as more and more of our information sources move onto the Internet and into the cloud. I see this as an issue on the same level of digital obsolescence, ownership and net neutrality. After all, what’s the use of the greatest trove of information in human history if you can’t trust it?

*It is pretty easy to figure out who the evangelical is, but I’ve decided not to perpetuate the hoax by linking his name up to it all over again.

You don’t own your ebooks. I think that should change.

Does the news that you don’t own your ebooks come as a surprise? I can understand why. When you bought your ebooks, you probably clicked a “buy” button. You likely gave a company money in exchange for an item. It now appears in your private and personal library. But while you certainly own your “dead tree” books, it’s not so clear with your digital books. The actual agreement you agreed to when you “bought” your ebooks was an End User License Agreement (EULA). So it turns out (at least if it ever comes to a legal dispute) that when you bought an ebook, you didn’t buy a book at all. You bought a limited license to access content under certain rules. Those rules are called Digital Rights Management or DRM.

This is made most clear when the real owner of the books asserts their rights over them by taking them back, basically whenever they feel like it. Digital copies of Nineteen-Eighty Four went down the memory hole for Kindle owners in 2009. In 2012, a Nook reader’s private library disappeared when her credit card expired. And, in 2011, a HarperCollins Publishers “Library Love Fest” concluded with the unwelcome (and frankly, kind of bizarre) news that library ebooks would burn themselves after 26 checkouts. Meaning that some of the library books librarians are buying, with your tax dollars, are deliberately designed to break. If someone destroyed physical library books in that fashion, as in entering my library and defacing or destroying them, I would call the police. But not so with library ebooks. I don’t operate under the delusion that my library actually owns those. I might add that library ebooks can cost up to three times as much as the ebooks you buy online.

Lending, giving and reselling–vital parts of book and reading culture–are all gone, if you’re an ebook owner. There is no such thing as a used ebook market, and the practice of donating your books to the public library is slowly being extinguished. Libraries don’t even control the checkout system for ebooks. Because of the administrative problems involved, a third party maintains our ebook collections, and these services are also provided under exorbitant and rising costs. The hoops library patrons are required to jump through to borrow ebooks–technologically, a simple task–boggles the mind.

In addition to being a public librarian, I am an author. I don’t begrudge publishers their income or authors their royalties. Far from it. I know that a vibrant and healthy reading culture means that publishers and authors need to put bread on the table. I have heard that the publishing industry is struggling. I know that publishers view every “lend” as a potential lost sale. But as a librarian serving a largely impoverished population on the far side of the digital divide, I also know that making books more expensive and harder to access doesn’t mean more sales. It means some people will read fewer books. They’ll find something else to do with their time. They’ll stop buying books altogether.

The music industry realized this years ago, when iTunes stopped putting DRM on music and started maintaining digital copies that listeners could download, at will, for free. They realized what publishers have yet to realize: books don’t need to physically expire. Cultural goods have a trajectory. As a librarian, I don’t need to buy replacements when the old books wear out (and I frequently don’t). I need to buy the new books that patrons are hankering to read today. In a year nobody’s going to remember what last year’s bestsellers were. Publishers don’t need to impose limits–all they need to do is ensure that the well is perpetually refreshed with new books and new formats. Readers will happily pay a hefty premium for access to today’s bestsellers. And libraries are essential in creating the buzz, culture and social networks that help sell a new book.

I feel like there needs to be a larger national discussion on book owners’ rights to their books. That’s why I created this “We the People” petition asking the White House to defend the rights of book owners. If you care about books, libraries and book culture, I urge you to add your initials to this petition. It needs 150 signatures to appear in public, and 100,000 to earn an official White House response. I know that there are 100,000 readers out there who take this issue as personally as I do, so please get the word out by sharing, blogging and talking about this issue.

The petition asserts some specific rights that I think are modest and fair. These rights are up for debate right now. Whether or not you agree with all of them, if you want to hear the administration address some of them, or even just the concept, you should still sign. And I encourage you to consider what I didn’t ask for–fairer pricing for libraries, the ability to convert legally purchased books to alternate formats, a right to resell, or a broad right to distribute ebooks for free. I am strongly against book theft and piracy. I encourage readers, librarians, publishers and policymakers to take a few minutes to help make progress on the issue of book ownership.

Linked in this blog:

Are you sure you want to reply to this blog?

If you’ve ever been harassed online, then you know how little you can do to solve the problem. A blocked bully can simply log in under another name (or IP address) or pursue his victim on another network. The ubiquity of the Internet may make it seem like the harassment is everywhere, and you have to prepare yourself before you check your own email, voicemail or website comments. Even anonymous bullying can feel disturbingly personal. The problem is even worse for teens and young adults, especially if harassment online is an extension of harassment at school. Even deleting an offensive comment is not as effective as never seeing one at all.

Behavioral researchers are working on ways to prevent online harassment before it starts. It turns out that most bullying comments fall within a very narrow range of words and phrases. Bullies are simply not that creative. Textual analysis can confirm, with a high degree of accuracy, whether or not a comment is likely to be bullying or not. The idea isn’t to prevent such comments from ever appearing on the Internet, but to make potential bullies think about their reputations and their tone before they post an offensive comment. If a comment is potentially offensive, the computer may give the offender more time before the comment is posted, or add a pop-up menu that says, “That sounds harsh! Are you sure you want to post that?” The offender would have to review the post once more before it was posted.

I certainly wouldn’t miss cyberbullying, and this system seems less subjective and costly than relying on humans to moderate comments or report offensive material. But I still wonder about the chilling effects. The message this sends to me as an Internet user isn’t “think before you speak,” but “your computer is watching you.” What other “offensive” or “disturbing” material might never make it on to the Internet at all, thanks to a machine that wants to be your mom?

I also wonder what textual analysis could reveal about other kinds of stereotypical speech patterns–and how that might be used to control or modify behavior. Political speech is distinctive. Could I use that knowledge to prevent people from posting certain types of speech on my web site? What if my web site was, say, a local newspaper? It wouldn’t be that hard for a piece of software to filter out most of the Tea Party comments, for example, and that could give an impression that there are fewer Tea Partiers than there are. Some mental illnesses are associated with unusual speech patterns. If you sound schizophrenic or depressed, should your computer tell you? What would you do if it did? And who is entitled to that information?

A few weeks ago I wrote briefly about A/B testing, and how different users are frequently shown slightly different web sites. What if there was A/B moderation as well?

Linked in this post:

What’s wrong with super-wifi?

There was a little bit of a dustup in the news this week when the Washington Post suggested in a front page story that the government was planning to create a free “super-wifi” network across the United States. By Tuesday evening, what started as an interesting and ambitious proposal turned into a case study in misinterpretation and blowing things out of proportion. “Super-wifi” turned out to be a media-created flash in the pan; according to one analyst it was “almost entirely fiction.” The real story had to do with the reacquisition and auctioning of “white space” between over-the-air TV channels.

If so, too bad. Government wifi, or “wireless as a utility” would be good for everyone, including telecoms. The Internet isn’t free, of course. Large, affordable, public wifi networks should be the interstate project of our time, providing employment now for workers (who would probably be contractors, not government employees) and opportunities in the future for small businesses, entrepreneurs and students.

This is not to mention that pipes need plumbers, and it takes more than accessibility to hook up a home or business to the Internet. I live in a community where broadband access has been severely constrained because ISPs and telecoms are unwilling to invest in rural infrastructure. We are served by a single provider; the current system of allocating broadband is not providing much opportunity for competition. But if the major infrastructure was already there, the work would certainly follow. I understand that telecoms have already invested billions (trillions?) in infrastructure, but if the government was willing to fund the next stage of construction, why wouldn’t you let them?

Using white space to create “super-wifi” is exactly the kind of big idea we need to kickstart the economy. The fact that the media ginned it up doesn’t mean it’s not worth exploring.

Linked in this post: