I have been struggling, for years, with getting speech-to-text working in a usable way. About every two years over the last ten, I go out a store, excited, and buy the most recent voice recognition software. I’m always hoping for a break-through. I’ve tried various versions of ViaVoice and Dragon Dictate. Dictation programs typically require you to read a few paragraphs to train them to your voice. Other than that, they are fairly easy to use. For the PC and Mac, voice recognition programs have reached an acceptable level of usability. Today, I can talk and dictate to my computer much faster than I can type. In fact, this article is being dictated to my Mac on a plane ride from San Francisco to Minneapolis. It seemed appropriate.
The guy next to me on the plane looks annoyed. I am using a wired microphone to dictate this story. Ok, now he is smiling. He is enjoying the irony. However, every time I say “period” to end a sentence the woman on the other side of me looks at me…
(Completing this article by typing)
“With Contempt” is what I was going to say, so I decided to start using the keyboard. Conservative, tightly wound person that probably thinks iPods are an evil plot.
The reality: voice recognition, even if it works, does not fit into the social construct of the plane ride, the coffee shop, the bus, the subway, in fact any mass transit system. It is a solitary endeavor OR completely annoying. Want to try a fun experiment? As voice becomes more commonplace in interfacing with devices…when you see someone talking out loud to command to their phone try this: In a loud voice say ” A B C 1 2 3″ and make sure it is loud enough for the mic on their phone to hear. It works and totally screws up the voice recognition. Fun.
Vlingo on my iPhone
While I mentioned Vlingo on my iPhone, I did not mention that it does not work for me. I am not a voice recognition expert, but I know that the processing power of mobile phones is not at the level of my new Core i7 MacBook (yeah.. it screams!) In Vlingo’s defense, if you articulate well and speak slowly, it is good for sending text messages and single line emails in a quiet environment. But it is not as good as laptop or desktop, and that is frustrating.
Don’t drink the Software-As-A-Service Kool Aid
Prediction: Cloud-based service for Mobile Voice Recognition is a bad direction. Even in a connected world, there are many places where you do not have either a cell or network connection. Does it work today? Sometimes. However, when voice recognition really works for mobile, it will have to be native and a core function with 100% availability. SaaS cannot offer that. I am really surprised that Steve Jobs added the voice command into the iPhone (Not a SaaS implementation, so they got that piece right). They usually don’t ship stuff that works 50% of the time. Apple should have tested it in my Jeep. If you follow the laws of computing, in about 10 years, mobile devices will be able to process voice as good as a desktop/laptop of today. This will be a convergence of technologies, just like when the evolution from the iPhone to the iPad. Voice commands will make more sense on the mobile device, just like some applications make sense on the iPad vs. the iPhone due to larger form factor.
What does this mean for technology affecting culture?
It can go in four directions. First, think: Do you remember the first time someone had a cell phone conversation, close to you, in a confined space? How about the first time someone sat in the stall next to you and had a loud cell phone conversation? How will you react when you are in close quarters and people are talking to their phones, dictating and email, text or tweet?
Direction 1: It will isolate people. Socially unacceptable, therefore people will withdrawal to a more quiet location to talk to their phones.
Direction 2: The older folks like me will lose the social acceptable battle. The younger generation will be texting, emailing, Tweeting and “voicing” and if we don’t like it, we can put on noise canceling head phones.
Direction 3: The advent of sub-audible microphones. Arthur C Clarke, one of the true thought leaders talked about this in several of his books. Basically, imagine a tooth implant that could pick up “throat noises”. These sounds would not be heard external to a person’s body. The sub-audible microphone would pick up the sounds and transmit to a mobile device. Everyone could happily be talking, Tweeting, emailing, or Voicing on the same subway car without every bothering a soul. Technology done right is elegant. Early versions of the microphone could be placed on the neck. No need to go running off to the Dentist just yet. Direction 3 is my prediction.
Direction 4: Something no one has thought of yet.
If everyone is using sub-audible microphones, there will be privacy issues. If someone whispers and you hear it, are you invading their privacy? Ethically yes, but legally, no. There will be devices that unscrupulous people will employ to invade privacy. There will be outrage, backlash and then an attempt at regulation. While interfacing with mobile devices via sub-audible, I predict that people will develop their own private vocabulary, like a password or macro to communicate securely with their devices.
Stepping stones to a strange new world
1. Seamless Voice recognition, native to mobile devices (8-10 year)
2. Sub Audible microphones (technology is here now)
3. Social acceptance (who knows)
I started this whole article as a mind-walk towards the concept of “think to text”. With sub audible mic’s, you can sometimes tell if someone is talking, because they will move their lips out of habit. The younger generation that grows up on it will not.
So what about “think to text”? I have no idea. The headsets that are supposed to measure brainwaves and sell for about $200 and are pure crap. Don’t waist your money, they don’t work. Based on the state of technology, a system that could recognize words you think is way off.
Waiting in line for the iThink
Note: The man next to me on the plane ride was a software engineer. We had a great conversation. Bill: great to meet you! It helped to talk through the scenarios. The woman next to us was a 4 term politician. Thought we were crazy, thinks ideas are dangerous, wanted to make everything Bill and I were “discovering” as an exercise in thought…illegal. Basic book burner. What a contrast!