Google Shows How Life Imitates Art With Compelling Demonstration of Natural Speech Recognition Technology

The browser as we know it will play a less important part in the Internet life of the future. As we noted earlier,  this transition is giving agita to the billionaires that run Google, Microsoft and Yahoo, as it disrupts the existing order. But change is necessary. The Browser / keyboard cum mouse input/output trio is just very limiting for many applications. For years the Yellow Pages advertised “Let Your Fingers Do the Walking.” But fingers are an imperfect input/output device, and pecking a keyboard and clicking or double-clicking a mouse are even worse. Natural speech, of course, is the way humankind  naturally communicates. But for humans to communicate with computers via natural speech requires a high degree of perfection in two technologies, speech synthesis and speech recognition. Speech synthesis has been the [relatively] easy part. It was first demonstrated at the 1939 World’s Fair. Text to speech was first popularized on a large scale with the Speak-n-Spell toy first sold in 1978. The technology has improved steadily to the point that it is recognizable, usuable, oft-times maddening (“For Claims, dial 1, for Returns, dial 2…”) , and perfectly understandable (“In 1.2 miles, exit onto Highway US 4”) if not perfectly natural. Speech recognition has been a  real challenge, one that that some thought was well-nigh impossible. Pioneered by Drs. Jim and Janet Baker in the 1970s, limited speech recognition (“what do you want to do?,” “‘Say ‘ship a package,’ ‘track a shipment’, etc.”) has come a long way is now a business with sales of over $5 billion annually, according to LumenVox. It has been most successful in applications with a constrained vocabulary or where the computer has been trained to recognize a specific speaker. The “holy grail” of recognizing unconstrained free speech uttered by the “general public” has been elusive.

Google has quietly been demonstrating lately that it’s made serious progress towards achieving this goal. First, it began offering its free Goog411 directory assistance service (try it – call 1-800-GOOG411). Now, it’s made available the Google Elections Video Search Gadget. This gadget allows you to type in a query, and Google will – lickety split – search all the utterances of the major polititians that have been uploaded to YouTube (and trust me, that’s a lot of words), and get back what they actually said, in their own words, by playing the relavent video segment for you. What it’s doing under the covers is creating a rough, but “good enough” transcript of each video, using its own natural speech to text conversion technology, then indexing this transcript in the usual way, and finally linking the text of the transcript to the related video footage. While quite remarkable by itself (Google’s announcement characterizes it as a “modest contribution”), it’s certainly a harbinger of what’s to come, when your mouth will become the primary communications device for “speaking” to a computer. Of course, this brings to mind Stanley Kubrick‘s famous 1968 film, 2001: A Space Odessey, proving once again that life can imitate art, but also reminds us of the admonition of King Solomon, that Life and death are in the hands of the tongue” (Proverbs 18:21).

One Response

  1. […] his latest post, Google Shows How Life Imitates Art With Compelling Demonstration of Natural Speech Recognition …, you introduces us to a cool gadget available from Google – the Elections Video […]

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: