The browser as we know it will play a less important part in the Internet life of the future. As we noted earlier, this transition is giving agita to the billionaires that run Google, Microsoft and Yahoo, as it disrupts the existing order. But change is necessary. The Browser / keyboard cum mouse input/output trio is just very limiting for many applications. For years the Yellow Pages advertised “Let Your Fingers Do the Walking.” But fingers are an imperfect input/output device, and pecking a keyboard and clicking or double-clicking a mouse are even worse. Natural speech, of course, is the way humankind naturally communicates. But for humans to communicate with computers via natural speech requires a high degree of perfection in two technologies, speech synthesis and speech recognition. Speech synthesis has been the [relatively] easy part. It was first demonstrated at the 1939 World’s Fair. Text to speech was first popularized on a large scale with the Speak-n-Spell toy first sold in 1978. The technology has improved steadily to the point that it is recognizable, usuable, oft-times maddening (“For Claims, dial 1, for Returns, dial 2…”) , and perfectly understandable (“In 1.2 miles, exit onto Highway US 4″) if not perfectly natural. Speech recognition has been a real challenge, one that that some thought was well-nigh impossible. Pioneered by Drs. Jim and Janet Baker in the 1970s, limited speech recognition (“what do you want to do?,” “‘Say ‘ship a package,’ ‘track a shipment’, etc.”) has come a long way is now a business with sales of over $5 billion annually, according to LumenVox. It has been most successful in applications with a constrained vocabulary or where the computer has been trained to recognize a specific speaker. The “holy grail” of recognizing unconstrained free speech uttered by the “general public” has been elusive.
Google has quietly been demonstrating lately that it’s made serious progress towards achieving this goal. First, it began offering its free Goog411 directory assistance service (try it – call 1-800-GOOG411). Now, it’s made available the Google Elections Video Search Gadget. This gadget allows you to type in a query, and Google will – lickety split – search all the utterances of the major polititians that have been uploaded to YouTube (and trust me, that’s a lot of words), and get back what they actually said, in their own words, by playing the relavent video segment for you. What it’s doing under the covers is creating a rough, but “good enough” transcript of each video, using its own natural speech to text conversion technology, then indexing this transcript in the usual way, and finally linking the text of the transcript to the related video footage. While quite remarkable by itself (Google’s announcement characterizes it as a “modest contribution”), it’s certainly a harbinger of what’s to come, when your mouth will become the primary communications device for “speaking” to a computer. Of course, this brings to mind Stanley Kubrick‘s famous 1968 film, 2001: A Space Odessey, proving once again that life can imitate art, but also reminds us of the admonition of King Solomon, that “Life and death are in the hands of the tongue” (Proverbs 18:21).
Filed under: Uncategorized Tagged: | 2001: A Space Odessey, Goog411, Google, Google Elections Video Search Gadget, internet browser, King Solomon, LumenVox, natural speech recognition, natural speech to text conversion, Proverbs, speech synthesis, Stanley Kubrick, unconstrained free speech, YouTube
[...] his latest post, Google Shows How Life Imitates Art With Compelling Demonstration of Natural Speech Recognition …, you introduces us to a cool gadget available from Google – the Elections Video [...]