Ironically, after seeing a physical therapist - which, let me tell you, you should do at the first sign of pain, because while they can't help some people I personally am batting 1.000 with PTs for RSI over my many-year career - my recovery is now so complete that I've totally fallen off the voice-computing path... for now. But I intend to keep going, not just because it is hilarious but because, well, RSI happens and it really pays to vary the routine sooner rather than later. There is nothing like trying to do a ton of emergency scripting on Python and emacs at the lowest possible point of your productivity.
The most important hint I have so far is: do not waste time with Mac OS. You need a PC running the Windows version of Dragon. The Mac version is pretty good for occasional email but lousy for emacs because it doesn't have the Python hook into the event loop that a saint hacked into the PC version years ago before leaving Dragon.
The speechcomputing.com forums are your friend.
Yeah, they say there is an open-source recognition engine that works okay, and time spent improving free recognition engines is time that really improves the world for all kinds of injured people, but here's the problem: when you need a speech system you really need it, and there are a lot of moving parts. Dragon, and Windows, and a super PC to run it on are super cheap compared to your time, especially when your time is in six-minute increments punctuated by pain.
which, let me tell you, you should do at the first sign of pain
Do that. REALLY REALLY do that.
Also, this book is superb:
http://www.amazon.co.uk/Its-Carpal-Tunnel-Syndrome-Professio...
http://aaroniba.net/articles/tmp/how-i-cured-my-rsi-pain.htm...
which recommends
http://www.amazon.com/Mindbody-Prescription-Healing-Body-Pai...
Most of the time I'm trying to figure out what to do or how to implement an algorithm. Rarely do I get those mad-scientist frenzies where I'm typing away frantically trying to get all the words down as they come into my mind in a flash of inspiration.
I've worked with people who are skilled developers and who can't even touch type. They have a slow-pace, methodical way of working. Many look at the keyboard over glasses and hunt and peck. The professor who ported plan 9 to raspberry pi (recent video here) is an example of this approach.
On the other hand, I have a shocking memory and can't hold context for long. Sometimes I come to write a piece of code and find that I wrote it last week and can't remember a thing about it.
I work by crashing through. I stalk the problem, procrastinate, drink tea, write short essays about what's stopping me from getting started. Eventually I get the whole problem in my head, and then need to get it down and done before I get tired. When I'm in this state and I need to solve a problem that I could use a standard library function for, often I'll just hammer out code to make the problem go away (list comprehension, string manipulation and the like) in order not to cause any extra load on my short term memory or distraction. Raw typing speed is very important. A drop in pace would hurt a lot.
An example tool to create it is my own program at https://github.com/jostylr/literate-programming which uses markdown as the syntax. While the examples are web language-flavored, it can be used with any language.
To experience something like it, try using your phone keyboard, with word prediction on, to write code. It will be slow, and frustrating, and have a lot of false starts.
There's a big difference between "not the fastest way to enter text" and "so slow it's unusable", and the impression I get is that without extensive macros like this, most speech to text systems are so slow as to be unusable for writing code.
At 11:30 in the video:
"Camel this is a test" -> thisIsATest "Studly this is a test" -> ThisIsATest "Jive this is a test" -> "this-is-a-test" "Dot word this is a test" -> "this.is.a.test" "Score this is a test" -> "this_is_a_test" "Mara" -> selects all text on screen "Chik" -> delete
He says he'll release his code in a few months.
Even with the macros and shortcuts he shows, I still would be slower using a system like that. When I'm typing in a good editor, I can blast out code VERY quickly, and when I've typed it, I KNOW it's what I meant. When he says it, he has to stop and look to ensure the code matches what he said.
Yes, he can say a phrase like "camel someVariableName" quickly, and sometimes it Just Works, but when it doesn't, he has to back up and say it again. That kind of distraction can throw me off my train of thought, and the damage to my productivity would be profound.
That said, it still IS great for anyone with an RSI as an alternate way to enter code. I just don't buy the "it could be better even for people who don't need it" argument. Especially with his claim that I would need to abandon my modern editor with awesome language support for one of those relics that relies on CTAGS.
1. I haven't understood the problem
2. I haven't understood the solution
3. I need to spend five minutes improving my Vim macros
I think it's one of the great things about working with extensible tools and having a tool-building mindset. You can maintain momentum while relaxing your brain from working on a seemingly intractable problem.
But, you should be using emacs. :P
It also helps to stop coding for a minute, think what you want to do, then code until you stop typing for more than 5 seconds, repeat.
I had to slow down my typing for a couple weeks to get the finger positions right. The whole time, I felt like I was coding with a hangover. I felt like I couldn't think properly, just because of the reduced brain->computer bandwidth.
Syntax [edit:] tip:
"try substituting tea or water for your morning coffee"
or
"try replacing your morning coffee with tea or water"
EDIT: For the downvoters: Fairly or unfairly, in the non-tech world people judge you by your choice and arrangement of words. (Compilers do much the same thing, of course.)
</pedantry>
I guess the presenter conducted the "faster than the keyboard" test under very controlled circumstances (e.g. only working on his own code, so one doesn't have to deal with non-english-word variables/functions).
I don't mean to be a hater, because that was an _amazing_ demo, but I don't believe it's the holy grail the title implies it is.
I have a grimmer point to make: Working out of crappy half-assed "startup incubators" with lousy desks, lousy seating, and an atmosphere flavored with stress was a direct contributor to my own RSI problems. You might not want to wait until you have symptoms to conclude that having an actual desk and some quiet is a good idea.
Or, you have a private office with a door that closes.
Sure, not the ideal, distraction-free environment, but neither is a cubicle farm.
I don't even get a cube where I work
I wonder how long it will take for reliable subvocal speech reading a-la [1] to become available in consumer products. It could potentially solve not only this problem but a lot of problems related to the use of cell phones in public spaces.
[1] http://www.nasa.gov/home/hqnews/2004/mar/HQ_04093_subvocal_s...
Once you are an adequate touch typist typing speed is only beneficial if you use a language that requires you to type a lot of boilerplate. Even then, you can use an IDE for auto-completion. I can type at very high speeds — as fast as others can input text by using their voice — but I can't remember the last time I needed to type for more than a minute at a time. If you use a language that requires you to spend more time thinking about code than it does to actually type it, typing speed really doesn't matter. Code is like speech in that it is judged by the eloquence, not the speed, of its delivery.
http://ep.yimg.com/ca/I/memx_2267_226185665
If you're using X11, you can go nuts with xmodmap and get it functioning at least as well as it did on Solaris.
I think getting a genuine Sun keyboard beats just remapping keys on a 101/104-key PC keyboard. There are 12 additional keys at the left and top-left of the keyboard just begging to be remapped for your own nefarious purposes. You also get meta keys that are separate from the Alt key, as well as Compose and AltGr keys for your åçcéñtêd character needs.
Plus when you look down and see the Sun logo, you can reminisce about the old days and have a good cry at your desk.
The pedals look like a good idea (I'm an emacs user)... but they do seem pretty goofy.
Kinesis' UK distributor has a 30 day 'sale or return' option (you pay shipping). http://www.ergonomics.co.uk/faq.html
[1] http://julius.sourceforge.jp/en_index.php [2] http://www.workinprogress.ca/KIKU/dictation.php
[2] http://www.workinprogress.ca/KIKU/dictation.php
It seems like this demo is not using Julius, but it's mixing messages a bit. The bottom of the page says "Service provided by Google Inc.", but the link right next to it (for downloadable software, also apparently called "kiku"?) says Julius etc.
The history of the Mac version (acquisition of a company that licensed the Dragon engine) means that it and the Windows versions are very likely permanently divergent. Given the relative market sizes, the Windows version has the best development, the best recognition, and the least schizophrenic product support.
I am glad that dictation (apparently powered by Nuance's engine anyway) is to be included in Mavericks, including a disconnected (i.e., non-Siri) mode. Maintaining an application with a skeleton crew and relying on system services that change at a fundamental level every couple years is not a path to customer satisfaction.
Now if they could optimize that...
For the couple of minutes I watched of him demoing it... I type waaaay faster than that. In fact, I can't possibly imagine how I could speak faster than I can code on the keyboard.
(Regular English sentences are another story, but code is full of important punctuation, exact cursor positioning, single characters, etc...)
I mean, this is awesome for people with trouble typing (which was my own case a few months back), but I don't think it needs to be over-sold by being "better"...
All he needs to establish is that he can do things like type aVariableNameLikeThis in six words (16% overhead) instead of fifteen[0] (200% overhead) and the rest of the claim follows.
[0] If you tried to type it using the out-of-box dictation in, say, Android or Dragon, you'd probably start with something like "lowercase a backspace uppercase variable backspace uppercase name..."
These voice control schemes almost always end up as a cool gimmick, and rarely as a productivity boosting solution.
For example, I could go to tend garden and yet think about some problem, take notes, even code. Or check email, browse internet. I can work on hardware thing and have schematics or specifications appear in front of my eyes. I can have a walk and take notes. I can eat while working.
Eventually, no office will be required. You can just stroll in the park and get the work done.
Dictating your javadoc is pretty damn convenient.
My default way to work is to bang some stuff into an editor and then constantly revise and reshape it. I'll draw diagrams on paper or white-board as necessary. I also tend to cut and paste "code notes" into a separate window so I don't have to keep that in my head.
Hilarious!
It's unfortunate he couldn't get the OSS speech recognition to work, though.
https://github.com/AshleyF/VimSpeak http://www.youtube.com/watch?v=TEBMlXRjhZY
Just watched it and I find it awesome, not just for the voice recognition but as well as a nice spoken out video of VIM usage. I learned some of nice things that I will use now more regularly in VIM.
For example: we say/think
for each item in list
but in a lot of languages you need to type something like foreach(item in list) {
A step further: we say/think let a be the substring of b from 1 to the end
we need to type a = b.substring(1)
Ofcourse the last example is much shorter and even more readable (to the machine for sure) but maybe code could be a little more human.A skilled musician likely doesn't engage the speech centres of their brain, they see a note on the sheet and translate it to motion. You should be able to take in the symbol for "apply a function to each item in a vector" at a glance without any clumsy English getting in the way. APL had it right, but coding has been crippled by catering to the lowest common denominator.
Indeed. I think notes are more 'human' than most programming languages. If the music goes up, the notes go up. If the notes are short they look short (and more dense).
But I agree that typing "let a be the subtring of b from 1 to the end" is no fun. So I'm glad we have symbolic languages. But I think they could be made more 'human'.
Who makes the best speech recognition software in the world? Regardless of whether it is available to consumers ... who is the best at it?
In particular, how do Apple (Siri) and Google (Google Now) compare to Nuance's stuff? Is Nuance so far ahead of everyone else that they're the clear leader? Or is their codebase "legacy" and vulnerable to better, more accurate software which can be built now due to better algorithms and approaches?
http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking#Histor...
RSI comes in multiple forms; using your voice exclusively is not going to fix the problem. The trick is to switch things up, which involves having alternatives in the first place.
Lots of water, avoiding nastiness in the air, learning the bare minimum volume of air you can push through your throat and still get results, and taking breaks when your body (either by feel or sound) tells you that it's tired.
In this specific case, adding leverage with short macros such as "laip" and "slap" is essential. There's no way you could work a full day spelling everything that wasn't in the recognizer's dictionary.
"slap... slap... jog... dot... word... chk... slap... snore"
If you could speak a bit softer with this, maybe throw in some noise-cancelling headphones, I could totally see this being useful even in an office situation.
I could see a potential pseudo-language developing out of this to abstract a lot of the individual characters, functions and common invocations used while coding.
How the hell did he code it without using his hands? With help?
To his amanuensis: Slap. York. Tork. Jorb. Chomp.
Or maybe he felt his hands going, and he spent the last few months of his pre-RSI existence coding this up.
https://sourceforge.net/projects/voicekey/ (tarball, includes language model) https://github.com/bshanks/voicekey (repo, does not include language model)
There's also this lightning talk http://www.youtube.com/watch?v=qXvbQQV1ydo from PolyglotConf (warning: crappy audio from a shaky cell phone cam).
I promised to release my duct tape code later this year. I'm a bit behind schedule with that but it should be out in a month or two.
What's the next big leap for speech to text programming? A language designed specifically to be speakable, ie, all keywords and no symbols?
I mean, I'd like speech recognition to get more natural error correction, drawing more from the way we use inflection to give feedback about which syllables to correct. (I love how Google on mobile now gives visual indication of which syllables it heard clearly, and which it didn't. I just wish it would understand when I shout "No, X not Y" to replace just that one misheard word.)
It'd be interesting to hear about where voice is heading from someone who uses the technology far more.
I remember first playing with voice recognition and voice command on a PPC Mac back in 1994.
That the technology hasn't progressed along the same lines as cell phones and processors is testament to how difficult voice recognition actually is when dealing with a wide variation of dialect within any given language.
I would love to be able to use my voice as my main input to my computers and other devices.
I bought a couple nice mechanical keyboards with Cherry switches (red and brown). I type very lightly on them, seldom bottoming out the keys. Finger troubles went away.
Basically review your work environment, keyboard, chair, table and posture.