Using Voice to Code Faster than Keyboard (opens in new tab)

(ergoemacs.org)

403 pointsidleworx12y ago133 comments

133 comments

A few months ago I had an RSI problem so bad - able to type only a minute at a time, even sitting with hands on keyboard hurt - that I started down this route. This video was, literally, a life-altering motivator for me, and I was quite obsessed with it.

Ironically, after seeing a physical therapist - which, let me tell you, you should do at the first sign of pain, because while they can't help some people I personally am batting 1.000 with PTs for RSI over my many-year career - my recovery is now so complete that I've totally fallen off the voice-computing path... for now. But I intend to keep going, not just because it is hilarious but because, well, RSI happens and it really pays to vary the routine sooner rather than later. There is nothing like trying to do a ton of emergency scripting on Python and emacs at the lowest possible point of your productivity.

The most important hint I have so far is: do not waste time with Mac OS. You need a PC running the Windows version of Dragon. The Mac version is pretty good for occasional email but lousy for emacs because it doesn't have the Python hook into the event loop that a saint hacked into the PC version years ago before leaving Dragon.

The speechcomputing.com forums are your friend.

Yeah, they say there is an open-source recognition engine that works okay, and time spent improving free recognition engines is time that really improves the world for all kinds of injured people, but here's the problem: when you need a speech system you really need it, and there are a lot of moving parts. Dragon, and Windows, and a super PC to run it on are super cheap compared to your time, especially when your time is in six-minute increments punctuated by pain.

milos_cohagen12y ago

As someone with a disability (quadriplegic), who types/codes with one finger, I find it appalling that Nuance, Apple and Google haven't opened up their speech recognition systems through a rudimentary API that would allow innovation that would _directly_ help the lives of me and many other disabled people whether it's RSI or worse.

mechanical_fish12y ago

It was a shock to me to discover that the livelihood and happiness of so many people depends on a dubiously-reliable unofficial API that was hacked into Dragon years ago and that has been lovingly preserved ever since, just below the radar. It feels like being critically dependent on Windows 95.

buovjaga12y ago

Here's some recent (very promising) work on open source dictation: http://grasch.net/node/23

thenomad12y ago

Having suffered with RSI for years (before the advent of current-gen speech recognition, so it really sucked), I heartily echo this advice:

which, let me tell you, you should do at the first sign of pain

Do that. REALLY REALLY do that.

Also, this book is superb:

http://www.amazon.co.uk/Its-Carpal-Tunnel-Syndrome-Professio...

read12y ago

The most surprising information I discovered curing RSI is:

http://aaroniba.net/articles/tmp/how-i-cured-my-rsi-pain.htm...

which recommends

http://www.amazon.com/Mindbody-Prescription-Healing-Body-Pai...

2 more replies

tavisrudd12y ago

Great book!

lifeformed12y ago

I guess it depends on the type of software you're working on, but input speed has never been close to being the bottleneck with coding for me...

Most of the time I'm trying to figure out what to do or how to implement an algorithm. Rarely do I get those mad-scientist frenzies where I'm typing away frantically trying to get all the words down as they come into my mind in a flash of inspiration.

cturner12y ago

It's very different for different programmers.

I've worked with people who are skilled developers and who can't even touch type. They have a slow-pace, methodical way of working. Many look at the keyboard over glasses and hunt and peck. The professor who ported plan 9 to raspberry pi (recent video here) is an example of this approach.

On the other hand, I have a shocking memory and can't hold context for long. Sometimes I come to write a piece of code and find that I wrote it last week and can't remember a thing about it.

I work by crashing through. I stalk the problem, procrastinate, drink tea, write short essays about what's stopping me from getting started. Eventually I get the whole problem in my head, and then need to get it down and done before I get tired. When I'm in this state and I need to solve a problem that I could use a standard library function for, often I'll just hammer out code to make the problem go away (list comprehension, string manipulation and the like) in order not to cause any extra load on my short term memory or distraction. Raw typing speed is very important. A drop in pace would hurt a lot.

jostylr12y ago

Have you tried literate programming? It is a way to download your thought process into the "code". Particularly good for those who like to write.

An example tool to create it is my own program at https://github.com/jostylr/literate-programming which uses markdown as the syntax. While the examples are web language-flavored, it can be used with any language.

1 more reply

DigitalJack12y ago

That's an interesting approach... the writing essays about what's stopping you from getting started. I think I might have to try that.

lambda12y ago

It's one thing to not be able to write code as fast as you can type. It's another to use a speech to text input method that's designed for long-form prose and try to use it to code. Can you imagine the frustration of trying to enter longCamelCaseVariableNames without a special macro to do so? I don't know the usual commands in Dragon, but I imagine it would be something like: "long delete space uppercase camel delete space uppercase case delete space upper case variable delete space uppercase names", possibly with a few false starts and undos in there as it interprets some of your words as commands rather than code.

To experience something like it, try using your phone keyboard, with word prediction on, to write code. It will be slow, and frustrating, and have a lot of false starts.

There's a big difference between "not the fastest way to enter text" and "so slow it's unusable", and the impression I get is that without extensive macros like this, most speech to text systems are so slow as to be unusable for writing code.

egypturnash12y ago

That's kinda the point of this article. He's got a bunch of macros and idiosyncratic commands.

At 11:30 in the video:

"Camel this is a test" -> thisIsATest "Studly this is a test" -> ThisIsATest "Jive this is a test" -> "this-is-a-test" "Dot word this is a test" -> "this.is.a.test" "Score this is a test" -> "this_is_a_test" "Mara" -> selects all text on screen "Chik" -> delete

He says he'll release his code in a few months.

2 more replies

SomeCallMeTim12y ago

I frequently have times when I'm doing things like "writing articles on what's stopping me from coding" and the like. But for me, when I'm on, I'm ON, and in those periods, being able to put code together reliably and quickly is of utmost importance.

Even with the macros and shortcuts he shows, I still would be slower using a system like that. When I'm typing in a good editor, I can blast out code VERY quickly, and when I've typed it, I KNOW it's what I meant. When he says it, he has to stop and look to ensure the code matches what he said.

Yes, he can say a phrase like "camel someVariableName" quickly, and sometimes it Just Works, but when it doesn't, he has to back up and say it again. That kind of distraction can throw me off my train of thought, and the damage to my productivity would be profound.

That said, it still IS great for anyone with an RSI as an alternate way to enter code. I just don't buy the "it could be better even for people who don't need it" argument. Especially with his claim that I would need to abandon my modern editor with awesome language support for one of those relics that relies on CTAGS.

pseudobry12y ago

...Did you watch the video?

moconnor12y ago

This; if I'm typing and jumping around the code at full speed for long periods it usually means either:

1. I haven't understood the problem

2. I haven't understood the solution

3. I need to spend five minutes improving my Vim macros

reeses12y ago

#3 is often the equivalent of taking a walk or a shower, or walking in the shower. It's enough of a context shift that your brain will forget the inessential and you'll notice the pattern you were hoping to extract.

I think it's one of the great things about working with extensible tools and having a tool-building mindset. You can maintain momentum while relaxing your brain from working on a seemingly intractable problem.

But, you should be using emacs. :P

nightski12y ago

No, but the less time you spend going through the mechanical action of translating thought to code the more time you have to focus on solving problems.

shurcooL12y ago

It usually happens during refactoring, or when you're doing something you've done before, so you already know pretty much exactly what needs to be done and you're just executing.

It also helps to stop coding for a minute, think what you want to do, then code until you stop typing for more than 5 seconds, repeat.

m_mueller12y ago

When refactoring: Instead of typing fast, use ST multicursors or VI/emacs macros in an intelligent way. And I really really recommend ST if you don't want to debug your editor macros before debugging your build macros before debugging your code macros before debugging your code (yo dawg etcetera).

sophacles12y ago

On the other end are us visual thinkers. I could do all of that proficiently just fine. In fact I do plenty of macros and scripts and so on. But at the end of the day, I think in pictures. I end up using a lot of editor short cuts and "lots of keypresses" style refactoring while I work out the shape I really want. Then, when I get that, fire off some macros to deal with the rest, cleanup, etc.

1 more reply

DennisP12y ago

People say the same thing about learning a good text editor. Personally I find that while I spend most of my time thinking, when it comes time to enter or edit code it helps a lot if I can do it as quickly as possible. That way I stay in flow instead of getting bored and clicking over to HN.

burke12y ago

I was never sure which side of this argument I came down on, and then I switched to a Kinesis Advantage keyboard.

I had to slow down my typing for a couple weeks to get the finger positions right. The whole time, I felt like I was coding with a hangover. I felt like I couldn't think properly, just because of the reduced brain->computer bandwidth.

bostonvaulter212y ago

Did you eventually get better at typing on the Kinesis?

1 more reply

henrik_w12y ago

Tangentially related, but I'll throw it in here, since so many developers aren't taking ergonomics seriously. RSI can happen to you if you are not careful, and it can wreck your career (almost happened to me). Several years ago, I started having aches in my arms. Over half a year it got gradually worse, until it was so bad, I thought I had to give up coding altogether. Fortunately, I managed to get it under control, mostly with the aid of a break program, and an ergonomic keyboard and mouse. I'm now completely over it, but I still need to be careful not to get it back. A lot more details in this post: http://henrikwarne.com/2012/02/18/how-i-beat-rsi/

muxxa12y ago

Personal anecdote: I correlated my RSI directly to drinking coffee (tea is okay). I notice when I'm caffeinated that my posture is very different and I hold postures (e.g. holding down the shift key) for much longer. If RSI starts to blight you, try substituting your morning coffee for tea or water. For me, a break program just increased the stress levels of 'wanting to get something done', which I think is the root cause of RSI (stress).

dctoedt12y ago

> try substituting your morning coffee for tea or water.

Syntax [edit:] tip:

"try substituting tea or water for your morning coffee"

"try replacing your morning coffee with tea or water"

EDIT: For the downvoters: Fairly or unfairly, in the non-tech world people judge you by your choice and arrangement of words. (Compilers do much the same thing, of course.)

</pedantry>

mechanical_fish12y ago

Nice people also judge those who try to shame people (not all of whom are native speakers of English) into silence on health-related forum threads by picking on irrelevancies.

2 more replies

ajross12y ago

For the life of me, I can't understand what you found wrong with that use of "substituting". It's correct. It's clear. It's perhaps less colloquial, but it's hardly inscrutable tech jargon.

Surely there are better targets for your editor's urges...

1 more reply

lutorm12y ago

Interestingly, when my RSI was bad and I was writing my PhD thesis with NaturallySpeaking, I noticed that voice fatigue was directly related to drinking coffee, too. The more coffee I drank, the move tired my voice would be at the end of the day. Then I mentioned this to a singer friend and she said basically said "of course. every singer knows that caffeine is bad for your voice."

mdaniel12y ago

My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.

I guess the presenter conducted the "faster than the keyboard" test under very controlled circumstances (e.g. only working on his own code, so one doesn't have to deal with non-english-word variables/functions).

I don't mean to be a hater, because that was an _amazing_ demo, but I don't believe it's the holy grail the title implies it is.

mechanical_fish12y ago

It is a limitation, but when your other choice is "not working at all, pain, depression, despair" having to work at home is the least of your problems.

I have a grimmer point to make: Working out of crappy half-assed "startup incubators" with lousy desks, lousy seating, and an atmosphere flavored with stress was a direct contributor to my own RSI problems. You might not want to wait until you have symptoms to conclude that having an actual desk and some quiet is a good idea.

applecore12y ago

> you must work from home with that kind of setup.

Or, you have a private office with a door that closes.

mhd12y ago

A high-quality headset, maybe even with some noise-canceling features should work okay, too. It wouldn't be that much different from a call center, and those don't usually get their own offices, either.

Sure, not the ideal, distraction-free environment, but neither is a cubicle farm.

2 more replies

rnovak12y ago

the day the average programmer gets an office, well...

I don't even get a cube where I work

georgemcbay12y ago

Mentioned this on HN previously but as a nearly 40 year old developer who has been developing professionally for nearly 20 years -- it used to be the norm for programmers to get their own offices, even just the regular joe programmers... Places really tight for space might put two guys in a very spacious shared corner office...

A few years into my career the idea of cubicles caught on and quickly became the norm, and now of course we're stuck with these horrible open offices that are, in my experience, just absolutely dreadful for productivity; but since everyone is doing it nobody really notices anymore.

3 more replies

sybhn12y ago

it's funny. The place i'm at now, in the valley and a successful post IPO SaaS company, management believes that open space and engineers running around, yelling and waving hands is sign or productivity. I have to escape to the kitchen to get anything requiring the tiniest level of concentration.

1 more reply

networked12y ago

>My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.

I wonder how long it will take for reliable subvocal speech reading a-la [1] to become available in consumer products. It could potentially solve not only this problem but a lot of problems related to the use of cell phones in public spaces.

[1] http://www.nasa.gov/home/hqnews/2004/mar/HQ_04093_subvocal_s...

fsck--off12y ago

"Emacs pinkie" is a non-issue if you use a keyboard with thumb clusters, e.g a Maltron or a Kinesis model. Investing in a good keyboard is just as crucial as investing in a good chair, especially if you make a living by coding. The time that you spend compensating for a bad input device by hacking your own workarounds can be more costly then spending money on a proper solution.

Once you are an adequate touch typist typing speed is only beneficial if you use a language that requires you to type a lot of boilerplate. Even then, you can use an IDE for auto-completion. I can type at very high speeds — as fast as others can input text by using their voice — but I can't remember the last time I needed to type for more than a minute at a time. If you use a language that requires you to spend more time thinking about code than it does to actually type it, typing speed really doesn't matter. Code is like speech in that it is judged by the eloquence, not the speed, of its delivery.

jamesaguilar12y ago

Yep. I only wish my brain had the bandwidth to produce code at the rate my fingers consume it.

ajross12y ago

It's similarly much less an issue when you map your keys correctly. Control goes to the left of "A", meta below "/". Much less pinky travel. Sun got this right way back in the 80's with the Type 3 keyboard (vi users prefer its placement of ESC too).

rwg12y ago

FWIW, you can get an unused Unix layout (Control left of A, Esc left of 1, Backspace directly above Return) Sun Type 6 USB keyboard for around $40.

http://ep.yimg.com/ca/I/memx_2267_226185665

If you're using X11, you can go nuts with xmodmap and get it functioning at least as well as it did on Solaris.

I think getting a genuine Sun keyboard beats just remapping keys on a 101/104-key PC keyboard. There are 12 additional keys at the left and top-left of the keyboard just begging to be remapped for your own nefarious purposes. You also get meta keys that are separate from the Alt key, as well as Compose and AltGr keys for your åçcéñtêd character needs.

Plus when you look down and see the Sun logo, you can reminisce about the old days and have a good cry at your desk.

wtallis12y ago

Apple also does well in their modifier key placement by having a narrower space bar that extends from "C" to "M" on most keyboards, meaning the modifier keys next to the space bar are easily reachable with thumbs.

pachydermic12y ago

Do those keyboards really make a difference? I'd be happy to try them and give it a fair chance, but I'm not okay with a 455 pounds gamble.

The pedals look like a good idea (I'm an emacs user)... but they do seem pretty goofy.

klancaster195712y ago

I had VERY bad RSI and had tried everything under the sun. Moving the the Kinesis stopped it dead. No more typing pain. Warning: it does take a bit to get used to.

kps12y ago

Maltron offer rentals for £10/week (assuming from your use of ‘pounds’ that you're in the UK), discounted if you buy it. http://www.maltron.com/keyboard-info/keyboard-hire-uk-only

Kinesis' UK distributor has a 30 day 'sale or return' option (you pay shipping). http://www.ergonomics.co.uk/faq.html

tavisrudd12y ago

I used a Kinesis for many years and had a great chair and ergonomic setup before developing the RSI that I describe in the video.

ics12y ago

I was trying to work something like this out to try about a month ago but had to put it aside for later. Running my speech recognition inside a virtual machine was a dealbreaker, but not all that uncommon for people doing this sort of thing. I really, really wanted to get Julius[1] running in OS X but after a couple tries I couldn't get it to build (problem on my end– this is a good reminder to get it sorted out). If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).

[1] http://julius.sourceforge.jp/en_index.php [2] http://www.workinprogress.ca/KIKU/dictation.php

mkl12y ago

If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).

[2] http://www.workinprogress.ca/KIKU/dictation.php

It seems like this demo is not using Julius, but it's mixing messages a bit. The bottom of the page says "Service provided by Google Inc.", but the link right next to it (for downloadable software, also apparently called "kiku"?) says Julius etc.

porker12y ago

That does work well. I'm happy to pay for Dragon, but I find the Windows version so superior to the OSX software I refuse to run it on OSX...

reeses12y ago

The OS X "version" is a nightmare. It's guaranteed to break with every major OS release. Nuance takes months to release working versions. When it does work, it's hostile to any other apps that use the accessibility hooks, such as Text Expander, Alfred, etc., which would be awesome with speech input.

The history of the Mac version (acquisition of a company that licensed the Dragon engine) means that it and the Windows versions are very likely permanently divergent. Given the relative market sizes, the Windows version has the best development, the best recognition, and the least schizophrenic product support.

I am glad that dictation (apparently powered by Nuance's engine anyway) is to be included in Mavericks, including a disconnected (i.e., non-Siri) mode. Maintaining an application with a skeleton crew and relying on system services that change at a fundamental level every couple years is not a path to customer satisfaction.

1 more reply

swayvil12y ago

~99% of my time coding is spent working through the stuff in my head

Now if they could optimize that...

crazygringo12y ago

Where is it backed up that it's faster than the keyboard?

For the couple of minutes I watched of him demoing it... I type waaaay faster than that. In fact, I can't possibly imagine how I could speak faster than I can code on the keyboard.

(Regular English sentences are another story, but code is full of important punctuation, exact cursor positioning, single characters, etc...)

I mean, this is awesome for people with trouble typing (which was my own case a few months back), but I don't think it needs to be over-sold by being "better"...

cbhl12y ago

I think this is a silly point of contention. If I recall correctly, it's established that for English-language prose, speech recognition is easily faster (300+ wpm) than typing (150-200 wpm if you're good; 20-50 wpm typical, IIRC).

All he needs to establish is that he can do things like type aVariableNameLikeThis in six words (16% overhead) instead of fifteen[0] (200% overhead) and the rest of the claim follows.

[0] If you tried to type it using the out-of-box dictation in, say, Android or Dragon, you'd probably start with something like "lowercase a backspace uppercase variable backspace uppercase name..."

sspiff12y ago

Whenever I see posts about voice controlling your computer, I spontaneously think "thank the heavens I don't have to share an office with you." I realize some people work alone, at home or in a sound proof office, but every work environment I've worked in has had a shared acoustic space.

These voice control schemes almost always end up as a cool gimmick, and rarely as a productivity boosting solution.

asgard102412y ago

Because you're thinking about it wrong. Together with HUD, it will be a godsend for anybody who needs to have hands free and yet work with a computer. And if the microphone is close enough your mouth, you won't have to talk loudly to it.

For example, I could go to tend garden and yet think about some problem, take notes, even code. Or check email, browse internet. I can work on hardware thing and have schematics or specifications appear in front of my eyes. I can have a walk and take notes. I can eat while working.

Eventually, no office will be required. You can just stroll in the park and get the work done.

sspiff12y ago

None of those usecases seem like something I would find useful, and talking with my mouth full doesn't seem convenient, I'm guessing your recognition ratio would go way down.

1 more reply

rossjudson12y ago

While I've never been able to adapt to using voice to code, what I have done successfully is use Dragon to document my code. I set up some macros that could move forwards and backwards between methods in Eclipse, added a "start doc" macro...Eclipse does a lot of very smart completion so basic features in Dragon handled it without difficulty.

Dictating your javadoc is pretty damn convenient.

JabavuAdams12y ago

I have a relatively small working memory, and I've been coding since I was a little kid. Coding is like thinking out loud for me.

My default way to work is to bang some stuff into an editor and then constantly revise and reshape it. I'll draw diagrams on paper or white-board as necessary. I also tend to cut and paste "code notes" into a separate window so I don't have to keep that in my head.

MarcScott12y ago

This reminded me of the guy who tried some Perl scripting using Windows Vista voice recognition.

http://www.youtube.com/watch?v=MzJ0CytAsec

colinm12y ago

First thing that came to mind.

Hilarious!

asgard102412y ago

I like it a lot. I wish there would be solution to tie this with say Google Glass, and be able to go on a walk or sit in the woods and code or make notes with it, hands free. Or while doing cooking or laundry, etc.

It's unfortunate he couldn't get the OSS speech recognition to work, though.

SimHacker12y ago

Yea, Google Glass would be ideal for DoucheScript Brogramming. Everyone could listen to you reindent your code while you held up the line at Starbucks.

JabavuAdams12y ago

Was just thinking of a way to be able to code on the subway. While it could annoy some, I'm often annoyed by stupid conversations on the subway. Can't close ears.

daGrevis12y ago

Reminds me of VimSpeak.

https://github.com/AshleyF/VimSpeak http://www.youtube.com/watch?v=TEBMlXRjhZY

reirob12y ago

Thanks for sharing!

Just watched it and I find it awesome, not just for the voice recognition but as well as a nice spoken out video of VIM usage. I learned some of nice things that I will use now more regularly in VIM.

ohwp12y ago

What I think is interesting is that a lot can be done to make typing easier and more human when you can type like you speak (and think).

For example: we say/think

  for each item in list

but in a lot of languages you need to type something like

  foreach(item in list) {

A step further: we say/think

  let a be the substring of b from 1 to the end

we need to type

  a = b.substring(1)

Ofcourse the last example is much shorter and even more readable (to the machine for sure) but maybe code could be a little more human.

gd112y ago

I disagree. You could argue that a musician probably thinks "I have to play a D# for one and a half beats" as well. Or they can draw a dotted quarter on the sheet. We have symbolic languages for a reason - they are, once learnt, superior. If anything code needs to move further away from spoken language, more in the direction of APL and its descendants.

A skilled musician likely doesn't engage the speech centres of their brain, they see a note on the sheet and translate it to motion. You should be able to take in the symbol for "apply a function to each item in a vector" at a glance without any clumsy English getting in the way. APL had it right, but coding has been crippled by catering to the lowest common denominator.

ohwp12y ago

"they see a note on the sheet and translate it to motion"

Indeed. I think notes are more 'human' than most programming languages. If the music goes up, the notes go up. If the notes are short they look short (and more dense).

But I agree that typing "let a be the subtring of b from 1 to the end" is no fun. So I'm glad we have symbolic languages. But I think they could be made more 'human'.

sp33212y ago

It isn't about English, but getting closer to the way programmers think. Most people don't think b.substring(1) natively any more than a musician would think "Da Capo al Coda". There are good parts of course; b[1:] is about as natural as ♩. for notation.

speeq12y ago

That was a fun talk to watch. Someone should try something similar using some kind of brainwave detecting glass gear to make it possible to code by simply thinking. That'd be awesome.

burke12y ago

Brainwave tech doesn't really get that kind of bandwidth without implants (and even then, interpreting the signals usefully is decades out). The skull is an unfortunately effective faraday cage, and it makes it impossible to get appropriately high-resolution and low-latency data. Maybe we'll figure it out eventually, but we're not even close right now.

therandomguy12y ago

At that point why not cut out the coding altogether? Just visualize the output and there it is.

dylangs103012y ago

Just materialize whatever you want, in perfect working order, by thinking of it.

1 more reply

charlieflowers12y ago

Question (halfway on topic) --

Who makes the best speech recognition software in the world? Regardless of whether it is available to consumers ... who is the best at it?

In particular, how do Apple (Siri) and Google (Google Now) compare to Nuance's stuff? Is Nuance so far ahead of everyone else that they're the clear leader? Or is their codebase "legacy" and vulnerable to better, more accurate software which can be built now due to better algorithms and approaches?

DigitalJack12y ago

I don't know who makes the best. But I know the history behind dragon is very sad.

http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking#Histor...

charlieflowers12y ago

Wow! That would lead one to speculate that perhaps they haven't had the best of engineering teams focused on improving the product over the years! Which means there might be a huge opportunity here.

cbhl12y ago

A word of warning -- I started dictating all of my email and Facebook replies on my Android using Google's voice keyboard on my Nexus One a few years ago in response to RSI pain in my hands from overusing my cell phone. Within a month, I started losing my voice.

RSI comes in multiple forms; using your voice exclusively is not going to fix the problem. The trick is to switch things up, which involves having alternatives in the first place.

reeses12y ago

Those vocal exercises singers do seem silly until you run into a problem such as this. They've been working on getting more mileage out of their larynxes for hundreds of years and have some pragmatic practices that can help.

Lots of water, avoiding nastiness in the air, learning the bare minimum volume of air you can push through your throat and still get results, and taking breaks when your body (either by feel or sound) tells you that it's tired.

In this specific case, adding leverage with short macros such as "laip" and "slap" is essential. There's no way you could work a full day spelling everything that wasn't in the recognizer's dictionary.

klancaster195712y ago

In the video he mentions that he wish he had known about the previous talk. Looked it up - http://pyvideo.org/video/1706/plover-thought-to-text-at-240-.... Pretty interesting. They are applying court reporter techniques to coding, cutting down on the keystrokes immensely.

mugenx8612y ago

Anyone else find speaking commands out loud to distract from thought?

"slap... slap... jog... dot... word... chk... slap... snore"

jotux12y ago

Most programmers have mnemonics for text motions and symbols so as long as they're mnemonics you're familiar with I'm sure there's no problem.

dylangs103012y ago

This is amazing!

If you could speak a bit softer with this, maybe throw in some noise-cancelling headphones, I could totally see this being useful even in an office situation.

I could see a potential pseudo-language developing out of this to abstract a lot of the individual characters, functions and common invocations used while coding.

unclesaamm12y ago

Okay, here's the million dollar question that isn't on the FAQ and no one in the audience asked.

How the hell did he code it without using his hands? With help?

To his amanuensis: Slap. York. Tork. Jorb. Chomp.

Or maybe he felt his hands going, and he spent the last few months of his pre-RSI existence coding this up.

tavisrudd12y ago

Once I got the basics working with the DragonFly and Natlink libraries I mentioned, I bootstrapped the rest mostly by voice.

bshanks12y ago

Here's an open source Python script i wrote a few years ago that allows you to type with your voice. It's based off of CMU Sphinx. The accuracy is almost certainly not as good as Dragon, and it doesn't have a macro facility, so you cannot code as fast as typing. I haven't improved it much over the past few years because my hands got better and i don't need it anymore.

https://sourceforge.net/projects/voicekey/ (tarball, includes language model) https://github.com/bshanks/voicekey (repo, does not include language model)

tavisrudd12y ago

Hi, I'm the guy in the video. You might also be interested in a presentation I gave last Sept at Strangeloop with a much longer demo of coding in Clojure and Elisp: http://www.infoq.com/presentations/Programming-Voice

There's also this lightning talk http://www.youtube.com/watch?v=qXvbQQV1ydo from PolyglotConf (warning: crappy audio from a shaky cell phone cam).

I promised to release my duct tape code later this year. I'm a bit behind schedule with that but it should be out in a month or two.

brownbat12y ago

Strangeloop was a great demo.

What's the next big leap for speech to text programming? A language designed specifically to be speakable, ie, all keywords and no symbols?

I mean, I'd like speech recognition to get more natural error correction, drawing more from the way we use inflection to give feedback about which syllables to correct. (I love how Google on mobile now gives visual indication of which syllables it heard clearly, and which it didn't. I just wish it would understand when I shout "No, X not Y" to replace just that one misheard word.)

It'd be interesting to hear about where voice is heading from someone who uses the technology far more.

unono12y ago

There's a lot of potential for multimodal gamified programming using tablets. A combination of gesturing, shaking the tablet, face expression, hand drawing, myo sensing, as well speech, in addition to machine learning in the compiler and for regular expression building. Within the next year a whole raft of apps along these lines will be coming online in the app stores. Big opportunity for Indie developers on the app store, you can easily charge $20+ if they're good and disrupt the emacs/vi/eclipse monopoly/monotony.

D9u12y ago

This is a cool project, as I think a voice interface would be the ultimate in computing, something like in "2001, A Space Odyssey," or "Star Trek."

I remember first playing with voice recognition and voice command on a PPC Mac back in 1994.

That the technology hasn't progressed along the same lines as cell phones and processors is testament to how difficult voice recognition actually is when dealing with a wide variation of dialect within any given language.

I would love to be able to use my voice as my main input to my computers and other devices.

balakk12y ago

It's awesome that it works, but that looks totally tiring.

singularity200112y ago

We need a new programming language optimized for voice: https://github.com/pannous/natural-english-script

frakkingcylons12y ago

Interesting talk. Naturally it made me think about steps I should take to prevent any kind of RSI. Should I be seriously concerned if I type for about 4-5 hours on average per day? How can I prevent it?

DennisP12y ago

Anecdotal: I was getting soreness in my finger joints, and about that time went to a presentation talking about repetitive motion causing arthritis for a lot of typists. It was pretty grisly. Padding in finger joints wears down, and little chips of bone start breaking off, causing pain from bone chips and realignment of fingers to fit the new bone faces. Padding restores with rest, so it helps a lot to catch it early.

I bought a couple nice mechanical keyboards with Cherry switches (red and brown). I type very lightly on them, seldom bottoming out the keys. Finger troubles went away.

abraham_s12y ago

This might be a good place to start. http://matt.might.net/articles/preventing-and-managing-rsi/

Basically review your work environment, keyboard, chair, table and posture.

lutorm12y ago

Take breaks.

quantumpotato_12y ago

Any good machine intelligence integrated with IDE? I'd love some AI autocompleting things.

SimHacker12y ago

"Uuuuhh..." should trigger the autocomplete menu.

oceanstone12y ago

This would be amazing, especially if it one day supported Linux natively.

jerogarcia12y ago

this is great , even that seems complicated and hard to get used to ... it's a fantastic option when nothing else works.

krupan12y ago

Amazing, but the cubical farm is noisy enough as it is.

stretchwithme12y ago

Welcome to the call center. How may I annoy you?

frozenport12y ago

I wonder if we should also be voice coding in a language drastically different then for example, C++? Maybe a language more syntactically friendly for voice?

SimHacker12y ago

One of the rules of Forth was that you had to provide a standard pronunciation with the documentation of all your words, so you could speak Forth code over the phone. That was important when words consist of any sequence of characters or punctuation, delimited by spaces.

singularity200112y ago

Fully agree This why we started working on a new language called "english script" : https://github.com/pannous/natural-english-script/tree/maste...

ufo12y ago

hmm, wouldnt it be better to link to some examples instead of the implementation?

robertfw12y ago

I've been wondering if lisp languages would be a good fit for this, as they have a minimum of syntax.

krisc12y ago

He mentions that in the video. The way Lisps are structured make it simpler for voice recognition.

j / k navigate · click thread line to collapse

133 comments

mechanical_fish12y ago

The speechcomputing.com forums are your friend.

milos_cohagen12y ago

mechanical_fish12y ago

buovjaga12y ago

Here's some recent (very promising) work on open source dictation: http://grasch.net/node/23

thenomad12y ago

Having suffered with RSI for years (before the advent of current-gen speech recognition, so it really sucked), I heartily echo this advice:

which, let me tell you, you should do at the first sign of pain

Do that. REALLY REALLY do that.

Also, this book is superb:

http://www.amazon.co.uk/Its-Carpal-Tunnel-Syndrome-Professio...

read12y ago

The most surprising information I discovered curing RSI is:

http://aaroniba.net/articles/tmp/how-i-cured-my-rsi-pain.htm...

which recommends

http://www.amazon.com/Mindbody-Prescription-Healing-Body-Pai...

2 more replies

tavisrudd12y ago

Great book!

lifeformed12y ago

I guess it depends on the type of software you're working on, but input speed has never been close to being the bottleneck with coding for me...

cturner12y ago

It's very different for different programmers.

On the other hand, I have a shocking memory and can't hold context for long. Sometimes I come to write a piece of code and find that I wrote it last week and can't remember a thing about it.

jostylr12y ago

Have you tried literate programming? It is a way to download your thought process into the "code". Particularly good for those who like to write.

1 more reply

DigitalJack12y ago

That's an interesting approach... the writing essays about what's stopping you from getting started. I think I might have to try that.

lambda12y ago

To experience something like it, try using your phone keyboard, with word prediction on, to write code. It will be slow, and frustrating, and have a lot of false starts.

egypturnash12y ago

That's kinda the point of this article. He's got a bunch of macros and idiosyncratic commands.

At 11:30 in the video:

He says he'll release his code in a few months.

2 more replies

SomeCallMeTim12y ago

pseudobry12y ago

...Did you watch the video?

moconnor12y ago

This; if I'm typing and jumping around the code at full speed for long periods it usually means either:

1. I haven't understood the problem

2. I haven't understood the solution

3. I need to spend five minutes improving my Vim macros

reeses12y ago

But, you should be using emacs. :P

nightski12y ago

No, but the less time you spend going through the mechanical action of translating thought to code the more time you have to focus on solving problems.

shurcooL12y ago

It usually happens during refactoring, or when you're doing something you've done before, so you already know pretty much exactly what needs to be done and you're just executing.

It also helps to stop coding for a minute, think what you want to do, then code until you stop typing for more than 5 seconds, repeat.

m_mueller12y ago

sophacles12y ago

1 more reply

DennisP12y ago

burke12y ago

I was never sure which side of this argument I came down on, and then I switched to a Kinesis Advantage keyboard.

bostonvaulter212y ago

Did you eventually get better at typing on the Kinesis?

1 more reply

henrik_w12y ago

muxxa12y ago

dctoedt12y ago

> try substituting your morning coffee for tea or water.

Syntax [edit:] tip:

"try substituting tea or water for your morning coffee"

"try replacing your morning coffee with tea or water"

EDIT: For the downvoters: Fairly or unfairly, in the non-tech world people judge you by your choice and arrangement of words. (Compilers do much the same thing, of course.)

</pedantry>

mechanical_fish12y ago

Nice people also judge those who try to shame people (not all of whom are native speakers of English) into silence on health-related forum threads by picking on irrelevancies.

2 more replies

ajross12y ago

For the life of me, I can't understand what you found wrong with that use of "substituting". It's correct. It's clear. It's perhaps less colloquial, but it's hardly inscrutable tech jargon.

Surely there are better targets for your editor's urges...

1 more reply

lutorm12y ago

mdaniel12y ago

My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.

I don't mean to be a hater, because that was an _amazing_ demo, but I don't believe it's the holy grail the title implies it is.

mechanical_fish12y ago

It is a limitation, but when your other choice is "not working at all, pain, depression, despair" having to work at home is the least of your problems.

applecore12y ago

> you must work from home with that kind of setup.

Or, you have a private office with a door that closes.

mhd12y ago

Sure, not the ideal, distraction-free environment, but neither is a cubicle farm.

2 more replies

rnovak12y ago

the day the average programmer gets an office, well...

I don't even get a cube where I work

georgemcbay12y ago

3 more replies

sybhn12y ago

1 more reply

networked12y ago

>My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.

[1] http://www.nasa.gov/home/hqnews/2004/mar/HQ_04093_subvocal_s...

fsck--off12y ago

jamesaguilar12y ago

Yep. I only wish my brain had the bandwidth to produce code at the rate my fingers consume it.

ajross12y ago

rwg12y ago

FWIW, you can get an unused Unix layout (Control left of A, Esc left of 1, Backspace directly above Return) Sun Type 6 USB keyboard for around $40.

http://ep.yimg.com/ca/I/memx_2267_226185665

If you're using X11, you can go nuts with xmodmap and get it functioning at least as well as it did on Solaris.

Plus when you look down and see the Sun logo, you can reminisce about the old days and have a good cry at your desk.

wtallis12y ago

pachydermic12y ago

Do those keyboards really make a difference? I'd be happy to try them and give it a fair chance, but I'm not okay with a 455 pounds gamble.

The pedals look like a good idea (I'm an emacs user)... but they do seem pretty goofy.

klancaster195712y ago

I had VERY bad RSI and had tried everything under the sun. Moving the the Kinesis stopped it dead. No more typing pain. Warning: it does take a bit to get used to.

kps12y ago

Maltron offer rentals for £10/week (assuming from your use of ‘pounds’ that you're in the UK), discounted if you buy it. http://www.maltron.com/keyboard-info/keyboard-hire-uk-only

Kinesis' UK distributor has a 30 day 'sale or return' option (you pay shipping). http://www.ergonomics.co.uk/faq.html

tavisrudd12y ago

I used a Kinesis for many years and had a great chair and ergonomic setup before developing the RSI that I describe in the video.

ics12y ago

[1] http://julius.sourceforge.jp/en_index.php [2] http://www.workinprogress.ca/KIKU/dictation.php

mkl12y ago

[2] http://www.workinprogress.ca/KIKU/dictation.php

porker12y ago

That does work well. I'm happy to pay for Dragon, but I find the Windows version so superior to the OSX software I refuse to run it on OSX...

reeses12y ago

1 more reply

swayvil12y ago

~99% of my time coding is spent working through the stuff in my head

Now if they could optimize that...

crazygringo12y ago

Where is it backed up that it's faster than the keyboard?

For the couple of minutes I watched of him demoing it... I type waaaay faster than that. In fact, I can't possibly imagine how I could speak faster than I can code on the keyboard.

(Regular English sentences are another story, but code is full of important punctuation, exact cursor positioning, single characters, etc...)

I mean, this is awesome for people with trouble typing (which was my own case a few months back), but I don't think it needs to be over-sold by being "better"...

cbhl12y ago

All he needs to establish is that he can do things like type aVariableNameLikeThis in six words (16% overhead) instead of fifteen[0] (200% overhead) and the rest of the claim follows.

[0] If you tried to type it using the out-of-box dictation in, say, Android or Dragon, you'd probably start with something like "lowercase a backspace uppercase variable backspace uppercase name..."

sspiff12y ago

These voice control schemes almost always end up as a cool gimmick, and rarely as a productivity boosting solution.

asgard102412y ago

Eventually, no office will be required. You can just stroll in the park and get the work done.

sspiff12y ago

None of those usecases seem like something I would find useful, and talking with my mouth full doesn't seem convenient, I'm guessing your recognition ratio would go way down.

1 more reply

rossjudson12y ago

Dictating your javadoc is pretty damn convenient.

JabavuAdams12y ago

I have a relatively small working memory, and I've been coding since I was a little kid. Coding is like thinking out loud for me.

MarcScott12y ago

This reminded me of the guy who tried some Perl scripting using Windows Vista voice recognition.

http://www.youtube.com/watch?v=MzJ0CytAsec

colinm12y ago

First thing that came to mind.

Hilarious!

asgard102412y ago

It's unfortunate he couldn't get the OSS speech recognition to work, though.

SimHacker12y ago

Yea, Google Glass would be ideal for DoucheScript Brogramming. Everyone could listen to you reindent your code while you held up the line at Starbucks.

JabavuAdams12y ago

Was just thinking of a way to be able to code on the subway. While it could annoy some, I'm often annoyed by stupid conversations on the subway. Can't close ears.

daGrevis12y ago

Reminds me of VimSpeak.

https://github.com/AshleyF/VimSpeak http://www.youtube.com/watch?v=TEBMlXRjhZY

reirob12y ago

Thanks for sharing!

Just watched it and I find it awesome, not just for the voice recognition but as well as a nice spoken out video of VIM usage. I learned some of nice things that I will use now more regularly in VIM.

ohwp12y ago

What I think is interesting is that a lot can be done to make typing easier and more human when you can type like you speak (and think).

For example: we say/think

  for each item in list

but in a lot of languages you need to type something like

  foreach(item in list) {

A step further: we say/think

  let a be the substring of b from 1 to the end

we need to type

  a = b.substring(1)

Ofcourse the last example is much shorter and even more readable (to the machine for sure) but maybe code could be a little more human.

gd112y ago

ohwp12y ago

"they see a note on the sheet and translate it to motion"

Indeed. I think notes are more 'human' than most programming languages. If the music goes up, the notes go up. If the notes are short they look short (and more dense).

But I agree that typing "let a be the subtring of b from 1 to the end" is no fun. So I'm glad we have symbolic languages. But I think they could be made more 'human'.

sp33212y ago

speeq12y ago

That was a fun talk to watch. Someone should try something similar using some kind of brainwave detecting glass gear to make it possible to code by simply thinking. That'd be awesome.

burke12y ago

therandomguy12y ago

At that point why not cut out the coding altogether? Just visualize the output and there it is.

dylangs103012y ago

Just materialize whatever you want, in perfect working order, by thinking of it.

1 more reply

charlieflowers12y ago

Question (halfway on topic) --

Who makes the best speech recognition software in the world? Regardless of whether it is available to consumers ... who is the best at it?

DigitalJack12y ago

I don't know who makes the best. But I know the history behind dragon is very sad.

http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking#Histor...

charlieflowers12y ago

Wow! That would lead one to speculate that perhaps they haven't had the best of engineering teams focused on improving the product over the years! Which means there might be a huge opportunity here.

cbhl12y ago

RSI comes in multiple forms; using your voice exclusively is not going to fix the problem. The trick is to switch things up, which involves having alternatives in the first place.

reeses12y ago

klancaster195712y ago

mugenx8612y ago

Anyone else find speaking commands out loud to distract from thought?

"slap... slap... jog... dot... word... chk... slap... snore"

jotux12y ago

Most programmers have mnemonics for text motions and symbols so as long as they're mnemonics you're familiar with I'm sure there's no problem.

dylangs103012y ago

This is amazing!

If you could speak a bit softer with this, maybe throw in some noise-cancelling headphones, I could totally see this being useful even in an office situation.

I could see a potential pseudo-language developing out of this to abstract a lot of the individual characters, functions and common invocations used while coding.

unclesaamm12y ago

Okay, here's the million dollar question that isn't on the FAQ and no one in the audience asked.

How the hell did he code it without using his hands? With help?

To his amanuensis: Slap. York. Tork. Jorb. Chomp.

Or maybe he felt his hands going, and he spent the last few months of his pre-RSI existence coding this up.

tavisrudd12y ago

Once I got the basics working with the DragonFly and Natlink libraries I mentioned, I bootstrapped the rest mostly by voice.

bshanks12y ago

https://sourceforge.net/projects/voicekey/ (tarball, includes language model) https://github.com/bshanks/voicekey (repo, does not include language model)

tavisrudd12y ago

There's also this lightning talk http://www.youtube.com/watch?v=qXvbQQV1ydo from PolyglotConf (warning: crappy audio from a shaky cell phone cam).

I promised to release my duct tape code later this year. I'm a bit behind schedule with that but it should be out in a month or two.

brownbat12y ago

Strangeloop was a great demo.

What's the next big leap for speech to text programming? A language designed specifically to be speakable, ie, all keywords and no symbols?

It'd be interesting to hear about where voice is heading from someone who uses the technology far more.

unono12y ago

D9u12y ago

This is a cool project, as I think a voice interface would be the ultimate in computing, something like in "2001, A Space Odyssey," or "Star Trek."

I remember first playing with voice recognition and voice command on a PPC Mac back in 1994.

I would love to be able to use my voice as my main input to my computers and other devices.

balakk12y ago

It's awesome that it works, but that looks totally tiring.

singularity200112y ago

We need a new programming language optimized for voice: https://github.com/pannous/natural-english-script

frakkingcylons12y ago

DennisP12y ago

I bought a couple nice mechanical keyboards with Cherry switches (red and brown). I type very lightly on them, seldom bottoming out the keys. Finger troubles went away.

abraham_s12y ago

This might be a good place to start. http://matt.might.net/articles/preventing-and-managing-rsi/

Basically review your work environment, keyboard, chair, table and posture.

lutorm12y ago

Take breaks.

quantumpotato_12y ago

Any good machine intelligence integrated with IDE? I'd love some AI autocompleting things.

SimHacker12y ago

"Uuuuhh..." should trigger the autocomplete menu.

oceanstone12y ago

This would be amazing, especially if it one day supported Linux natively.

jerogarcia12y ago

this is great , even that seems complicated and hard to get used to ... it's a fantastic option when nothing else works.

krupan12y ago

Amazing, but the cubical farm is noisy enough as it is.

stretchwithme12y ago

Welcome to the call center. How may I annoy you?

frozenport12y ago

I wonder if we should also be voice coding in a language drastically different then for example, C++? Maybe a language more syntactically friendly for voice?

SimHacker12y ago

singularity200112y ago

Fully agree This why we started working on a new language called "english script" : https://github.com/pannous/natural-english-script/tree/maste...

ufo12y ago

hmm, wouldnt it be better to link to some examples instead of the implementation?

robertfw12y ago

I've been wondering if lisp languages would be a good fit for this, as they have a minimum of syntax.

krisc12y ago

He mentions that in the video. The way Lisps are structured make it simpler for voice recognition.

j / k navigate · click thread line to collapse