In response to the question “What will change everything?,” Marti Hearst wrote in edge.org on the increasing ubiquity of video and audio, and on how these media are encroaching on the “market share” of text for communication in our society. It’s an interesting piece: the premise is that as video and audio have become increasingly easy to create and distribute, their use has started supplanting text in communication. She cites examples of success rates of podcasts as marketing vehicles, YouTube video comments, and people pointing cameras on themselves to pose questions to then-presidential candidate Barack Obama.
And she has a point. When video was difficult or expensive to create, it was a fringe medium, restricted to broadcast-only situations. These days, a cheap web cam is all you need to broadcast yourself. The interaction is as easy as typing, and you don’t have to know how to spell. A preview of this phenomenon was available to 1990s Canadian TV audiences through the popular Speakers’ Corner, a video booth near CityTV in Toronto where anyone could record a short video of themselves that would (eventually) get aired. (The booths were later installed in other Canadian cities.)
In his reaction to this article, Andrew Dillon pointed out that rather than being replaced by video and audio, text would supplemented by them. His argument was based on psychological considerations, although he did not articulate them explicitly. I would like to take a complementary approach, and critique the edge piece from a more technological perspective.
Marti Hearst’s article glosses over an important aspect of text that sets it apart from other media, namely the relative ease with which text structure can be associated (approximately) with meaning. Text is such an important part of the computer era because it can be tokenized, parsed, indexed, and searched relatively easily. Most of the text we create is stored somewhere (traditionally on paper, but increasingly on some computer network) for subsequent retrieval. While in many cases this retrieval happens through metadata that is equally applicable to other media, in many other cases textual documents are retrieved by searching on their content. This is a capability that we now take for granted, but its ubiquity rests squarely on the ease of processing text for indexing and on the ease of generating queries that return useful information.
If it were hard to parse documents into semantically useful chunks (as it is with video), or if it were hard to construct queries that identify desired documents (as it is with video), we would not be able to take advantage of the large variety of information that is “out there.” Following links and doing metadata-based search does not scale well to the web, or even to collections of moderate size unless a lot of (expensive) manual effort is devoted to indexing the materials.
Video is now easy to create, but it is still difficult to index and to query without relying on textual proxies. Furthermore, it is unclear to me that the general case of querying video based on its content is likely to be solved any time soon. While many demonstration systems have been built to query video based on various low-level features such as color histograms and spatial features, these map poorly to things that people actually care about. Manual indexing, even of the “wisdom of crowds” type alluded to in the edge article, cannot scale to the scale of the web. Audio is an interesting intermediate case because it can be converted automatically more easily to and from text, which can then be used to index and retrieve it.
Even when we watch movies that evoke strong emotional responses, we use words when communicating with others about our experiences. It is this extreme (inherent?) facility with language for communication, coupled with the relative ease of processing it that gives text special status in our culture. Our improved ability to record ourselves on video or audio may increasingly enhance our communication, but these media are unlikely to be generally effective in the absence of text.
I find video to be in..cred..i..bly..slow, in terms of the rate at which it communicates certain forms of information. As a result, I’ll often not watch the video at all, even when it’s only a click away.
It takes me 30 seconds to read two paragraphs of text. That’s quick. It takes me 4-5 minutes to get the same amount of information from lots of videos. Especially if it’s a video of someone giving a talk. Not so quick.
Video does have the power to communicate information faster than text. I think the upper bound on the conversion rate is 1 picture = 1000 words. But it has to be really well done video. And that takes time and effort and careful crafting to create. It takes a real visual storyteller. Most video, especially web video, is not up to this standard. I therefore find much of it highly unwatchable, not because of lack of (visual) production quality, but for the lack of concise storytelling, the sheer slowness at which I receive information.
I do, however, love to listen to podcasts in the car. The slower pace of the information is a fair trade-off with the fact that I really don’t want to be reading text while I am driving.
Point wasn’t that video is useless; far from it. Rather, what makes text special is our ability to process it with computers while staying “close” to its meaning. We are nowhere near doing that with video.
My point wasn’t that video was useless, either. It was to say that, just like computers have difficulty processing video, so do humans. It takes a long time for a human to extract the same amount of information from video, as they can get from a couple of paragraphs of text.
Gene, thank you for the thoughtful commentary on this essay. I do want to point out that in the essay I said that the main thing holding back this decline is the need for technology for better search, as well as better editing tools. I agree that probably interesting text/audio/video hybrids will arise.
Marti, I wonder how much education will affect people’s willingness to use video editing tools. Right now we teach kids creative writing as a standard part of the high school curriculum. If creative videography was also a required subject, would that increase the demand for simple video creation tools, or would it raise the bar in terms of people’s expectations about what video editing/production tools should be like to tell a good story?