Most Pressed Keys and Programming Syntaxes (opens in new tab)

(mahdiyusuf.com)

94 pointstalmirza13y ago74 comments

74 comments

I admit, the Lisp one at the end made me laugh.

For most other languages the focus seems to be mainly on letters (typing out keywords/identifiers), so in that sense there is little evidence that one language would be easier to write than another.

Additionally, I imagine that some of the code is auto-completed by an IDE, which this analysis fails to account for.

dkersten13y ago

I'd love to see how Clojure compares, as it uses fewer parentheses and may even be competitive to, eg, Java in its use of parentheses.

As an aside, its interesting (to me, as a Colemak user) that all of the hot alphabetic characters in all the analyzed languages are on the home row of the Colemak layout (except L, which seems to be kind of hot in C++).

abp13y ago

https://twitter.com/#!/swannodette/media/slideshow?url=http%...

1 more reply

batgaijin13y ago

Yeah, paredit would make only the left paren red...

sordina13y ago

Shouldn't shift be twice as hot as the parentheses?

snprbob8613y ago

No because this heat map is clearly generated "offline". That is, this is built from a source-file dataset. An "online" dataset would be the output of a key logger. Online analysis would show the number of right parentheses to be a tiny fraction of left parentheses due to auto insertion and other paredit-like operations.

2 more replies

civilian13y ago

Yes, and in fact they are ignoring Control, Space and Tab too.

shasta13y ago

One question is why the Lisp code doesn't use the letter E as much as every language? Is that just due to normalizing of parentheses?

chacham1513y ago

While I do not know standard lisp, according to my memory of scheme the following list of c keywords containing the letter 'e' exist in scheme: while.

while the following do not: break, case, continue, default, double, else, enum, extern, register, return, signed, sizeof, typedef, unsigned, volatile

PerryCox13y ago

It probably gets used about the same amount as the other languages, it's just that the parentheses get used an insane amount and outweigh all the other characters.

1 more reply

kjhughes13y ago

If this were actual keys pressed while programming rather than an analysis of completed code, the action on app switch and editor meta-keys would rival that of the highest ranking regular keys.

the_cat_kittles13y ago

It would be nice to see these heat maps normalized by the average frequency of each key. Then you can really see what stands out about each particular language.

nhebb13y ago

... and normalized against the frequency of each letter in a dictionary of common English words. Most likely, 'e' is common in programming because it's the most common letter in English:

http://en.wikipedia.org/wiki/Letter_frequency

citricsquid13y ago

Vaguely related, if anyone is interested in tracking their own typing (and individual key counts) check out http://whatpulse.org it's pretty great :-) (my profile, http://whatpulse.org/stats/users/210575/)

zer0113y ago

...so this captures every key and mouse movement on your computer, then sends it to a 3rd party server for 'analysis'.

Seems legit.

positr0n13y ago

I realize you're probably joking :) but the software just "pulses" the number of keystrokes and clicks to the server whenever you tell it to.

It can generate keystroke data like this, but the data is stored locally.

zeroonetwothree13y ago

Almost any software you install could potentially be doing that.

alpb13y ago

I skimmed comments here and couldn't find anything that talks about why e is so popular? Is it because ETAOIN? Because almost no "{" "}" pressed in C/Java. Almost no ":" used in Python. "/" key is more popular than "[" "]" in Objective-C. This makes no sense. I don't believe that blog post.

Heinleinian13y ago

Just off the top of my head, I'd guess it's due to a lot of e's in words that, depending on the language, you'll see in almost every single method or function. Things like return, end, else, true, include, self, private, etc.

zxoq13y ago

Don't forget set / get, which ubiquitous in almost every language.

orangeduck13y ago

Because it is the most common letter in English and used for variable and identifier names.

k-mcgrady13y ago

I found the Objective-C one strange too. I would have thought '[' and ']' would have been two of the most pressed keys. I only really use '/' for commenting.

dan123413y ago

I do type a lot more square brackets in ObjC than any other language, but I find that I type a lot more generally due to the increased length of many method names in ObjC (or I would do if autocomplete didn't do it for me).

(Simple) Example:

  PHP:
  $myString = sprintf('Total: %.2f', $myFloat); 

  ObjC:
  NSString *myString = [NSString stringWithFormat:@"Total: %.2f", myFloat];

Maybe this is the reason these characters don't score too highly in overall frequency? I have no answer for the high frequency of '/'.

1 more reply

simias13y ago

I would like to see what a heatmap generated from a "real" typing session would look like (with a keylogger). You could see the influence of the editor as well.

Since these are generated offline, the keyboard heatmaps are meaningless and the representation is slightly misleading IMO.

schme13y ago

A live sample would be much more interesting, both would be best. I'd be most interested in the meta keys. As a scandinavian especially {}, [] etc. buttons are very awkward to press. Infact, so are most special characters used in programming.

As mentioned, auto-complete and similar functionality change the heatmap, but that's what people actually press. This data would be alot better for actual use.

Though I don't mean it as a scold, it wasn't really in the hands of the author to collect such vast amounts of live data, and surely a lot more work than was his intension.

mattparlane13y ago

It's not really measuring "pressed keys", it's measuring a final product -- I'd be interested to see which languages highlight the backspace/delete keys more.

eddie_the_head13y ago

I'd be interested in seeing which keys I press most when I'm programming in APL.

drivingmenuts13y ago

They tried measuring that and accidentally created the runes to summon an angry Elder God, with predictable consequences.

einhverfr13y ago

Next bit would be interesting to look at hand position of programmers in different languages. I know when I am programming Perl my right hand tends to move back and forth into different positions while my left hand stays in the standard position. Wondering if I am unique there or if it is common, and how other languages affect this.

joestringer13y ago

I'd be curious to see the same analysis done but with alternate keyboard layouts, such as dvorak or colemak.

lmm13y ago

It'd come up exactly the same, unless you're going to pick out the dvorak-users on github. Even then I wouldn't expect much difference. We're measuring characters in code, not actual keypresses.

nix13y ago

"Shift" is a big omission, though you can guess at it from the emphasis on certain numeric keys. One of the great things about Python is that there are fewer chorded characters. It's also one of the worst things about Lisp on a standard keyboard.

Zakiazigazi13y ago

Hi, I added shift counting to Mahdi's code a while ago: (https://github.com/zaki/Keyboard-Heatmap-1), so it should be easy to generate more correct heatmaps (in multiple keyboard layouts too) if you are interested.

nkoren13y ago

Interesting, but a different visualisation would be even more interesting: heat maps showing deviations from the mean. This would highlight the differences between the languages, which (except for the case of Lisp) are rather subtle.

tunnuz13y ago

It would be nice to consider how auto-completion actually biases the real distribution of key-presses, e.g. I wouldn't expect closing parenthesis or brackets to be pressed as often as their opening counterparts.

mahmud13y ago

Not if you're an emacs user. I use key-chords most programming language forms.

Also, for Lisp, I never touch the closing paren. M-( does both at the same time.

godDLL13y ago

Here are the keys directly under my fingers on the home row: A R S T N E I O

And I can visually see the reasoning behind Colemak being like this, now.

iamwil13y ago

It'd be useful to use the histograms to distinguish between different programming languages for automatic language detections of something like gists.

obtu13y ago

highlight.js [1] does this, though by running highlighters rather than using some learning-based mechanism. It feels wasteful to run this on display rather than storage, though. SourceClassifier [2] also works, though with less languages. And here's [3] an implementation made with Bayes and Go.

[1] http://softwaremaniacs.org/soft/highlight/en/ http://softwaremaniacs.org/media/soft/highlight/test.html

[2] http://blog.chrislowis.co.uk/2009/01/04/identify-programming...

[3] https://github.com/octplane/go-code-classifier

philwelch13y ago

Interesting how "i" is more common in some languages than others. C and C++ make sense (for(i = 0; i < n; i++)), but Ruby is a puzzler.

riffraff13y ago

"nil" "if" and "elsif"

EDIT: though looking at my sources, where it seems to also be popular, it mostly seems to match inside non-syntax (field, nickname, to_i, strip, index, client).

minikomi13y ago

How about |i|?

1 more reply

jakejake13y ago

I can understand 0 and 1 being frequently used. But I wonder why 5,6 & 7 seem to be under-used compared to the other numbers?

zdw13y ago

I was thinking the same thing until I looked at Perl, which has 4 being quite frequent.

The reason, of course, is that 4 is also $, which is used to denote a scalar in Perl.

Thus, because 5,6,7 correspond to %,^,&, which generally get used to a lesser degree for things like modulo, hashes, exponentiation and logical-and, they're used less.

irahul13y ago

> I can understand 0 and 1 being frequently used. But I wonder why 5,6 & 7 seem to be under-used compared to the other numbers?

The heat map isn't accounting for shift. 5,6,7 also include %,^,&

balbeit13y ago

I would wager that it is because when creating named variables, people tend to start with the low integers, like var1, var2, etc... And when using constants, they will often use maximums up to a threshold, like 999.99. So the middle range (5-8) is rarely used.

DeepDuh13y ago

What looks strange to me is that semicolon is is not one of the most common in C-derived languages.

PerryCox13y ago

Why is E the most common across multiple languages (except Lisp (which I assume is due to the parentheses))? I assume it's usage is higher because it's a vowel, but none of the other vowels are nearly that high.

sosuke13y ago

E is the most common letter in the English language. There was even a fun book written without the e: http://en.wikipedia.org/wiki/Gadsby_(novel)

tikhonj13y ago

There is also A Void: it was originally written in French, which (I think) uses the letter "e" more often than English and then translated to English. In both cases it did not use the letter "e" at all.

I only know about this because it was referenced in a book on cryptanalysis. The simplest sort of cipher can be broken by paying attention to the relative frequency of letters in the original text. I remember a useful mnemonic for remembering the most common letters: the sentence "a sin to err" contains them. E, followed by t and a, are the most common out of those (t and a are very close).

1 more reply

waterlesscloud13y ago

At first I thought it would be in most keywords. But looking at one of my Python files (a tiny sample, to be sure), I see that it doesn't occur all that often in the keywords I used.

However, it is in most of my variable names. Given that it's the most common letter in English, that makes sense.

There's the famous phrase "ETAOIN SHRDLU", dating back to printing press days, of the approximate order of the most common letters in English.

Not to be confused with the early AI program "SHRDLU"". :-)

cstavish13y ago

What kind of C programming is this guy doing? '*' is relatively untouched.

boryas13y ago

Guess: maybe all the pointers to structs are typedef-ed away?

AndyKelley13y ago

Dear author: would you consider also generating the results for Dvorak?

ibotty13y ago

and programmer dvorak :D

cpeterso13y ago

I wonder what a programming language designed to minimize shifting would look like. Python does a pretty good job because it uses few curly brackets and no semicolons.

riffraff13y ago

sadly, highly dependent on keyword layout, e.g. my keyboard has square brackets and the equals sign only accessible via a key combination, while the US keyboard does not.

It seems the only safe character across many countries are 0-9a-z.,-\ plus space/tab/return. Not a lot to work with :)

lucian190013y ago

That's part of why I always use a US layout, even on UK keyboards. Also because I've grown up with it.

stigi13y ago

Gotta say, I expected more square brackets for Objective-C.

j / k navigate · click thread line to collapse

74 comments

sltkr13y ago

I admit, the Lisp one at the end made me laugh.

For most other languages the focus seems to be mainly on letters (typing out keywords/identifiers), so in that sense there is little evidence that one language would be easier to write than another.

Additionally, I imagine that some of the code is auto-completed by an IDE, which this analysis fails to account for.

dkersten13y ago

I'd love to see how Clojure compares, as it uses fewer parentheses and may even be competitive to, eg, Java in its use of parentheses.

abp13y ago

https://twitter.com/#!/swannodette/media/slideshow?url=http%...

1 more reply

batgaijin13y ago

Yeah, paredit would make only the left paren red...

sordina13y ago

Shouldn't shift be twice as hot as the parentheses?

snprbob8613y ago

2 more replies

civilian13y ago

Yes, and in fact they are ignoring Control, Space and Tab too.

shasta13y ago

One question is why the Lisp code doesn't use the letter E as much as every language? Is that just due to normalizing of parentheses?

chacham1513y ago

While I do not know standard lisp, according to my memory of scheme the following list of c keywords containing the letter 'e' exist in scheme: while.

while the following do not: break, case, continue, default, double, else, enum, extern, register, return, signed, sizeof, typedef, unsigned, volatile

PerryCox13y ago

It probably gets used about the same amount as the other languages, it's just that the parentheses get used an insane amount and outweigh all the other characters.

1 more reply

kjhughes13y ago

If this were actual keys pressed while programming rather than an analysis of completed code, the action on app switch and editor meta-keys would rival that of the highest ranking regular keys.

the_cat_kittles13y ago

It would be nice to see these heat maps normalized by the average frequency of each key. Then you can really see what stands out about each particular language.

nhebb13y ago

... and normalized against the frequency of each letter in a dictionary of common English words. Most likely, 'e' is common in programming because it's the most common letter in English:

http://en.wikipedia.org/wiki/Letter_frequency

citricsquid13y ago

zer0113y ago

...so this captures every key and mouse movement on your computer, then sends it to a 3rd party server for 'analysis'.

Seems legit.

positr0n13y ago

I realize you're probably joking :) but the software just "pulses" the number of keystrokes and clicks to the server whenever you tell it to.

It can generate keystroke data like this, but the data is stored locally.

zeroonetwothree13y ago

Almost any software you install could potentially be doing that.

alpb13y ago

Heinleinian13y ago

zxoq13y ago

Don't forget set / get, which ubiquitous in almost every language.

orangeduck13y ago

Because it is the most common letter in English and used for variable and identifier names.

k-mcgrady13y ago

I found the Objective-C one strange too. I would have thought '[' and ']' would have been two of the most pressed keys. I only really use '/' for commenting.

dan123413y ago

(Simple) Example:

  PHP:
  $myString = sprintf('Total: %.2f', $myFloat); 

  ObjC:
  NSString *myString = [NSString stringWithFormat:@"Total: %.2f", myFloat];

Maybe this is the reason these characters don't score too highly in overall frequency? I have no answer for the high frequency of '/'.

1 more reply

simias13y ago

I would like to see what a heatmap generated from a "real" typing session would look like (with a keylogger). You could see the influence of the editor as well.

Since these are generated offline, the keyboard heatmaps are meaningless and the representation is slightly misleading IMO.

schme13y ago

As mentioned, auto-complete and similar functionality change the heatmap, but that's what people actually press. This data would be alot better for actual use.

Though I don't mean it as a scold, it wasn't really in the hands of the author to collect such vast amounts of live data, and surely a lot more work than was his intension.

mattparlane13y ago

It's not really measuring "pressed keys", it's measuring a final product -- I'd be interested to see which languages highlight the backspace/delete keys more.

eddie_the_head13y ago

I'd be interested in seeing which keys I press most when I'm programming in APL.

drivingmenuts13y ago

They tried measuring that and accidentally created the runes to summon an angry Elder God, with predictable consequences.

einhverfr13y ago

joestringer13y ago

I'd be curious to see the same analysis done but with alternate keyboard layouts, such as dvorak or colemak.

lmm13y ago

It'd come up exactly the same, unless you're going to pick out the dvorak-users on github. Even then I wouldn't expect much difference. We're measuring characters in code, not actual keypresses.

nix13y ago

Zakiazigazi13y ago

nkoren13y ago

tunnuz13y ago

mahmud13y ago

Not if you're an emacs user. I use key-chords most programming language forms.

Also, for Lisp, I never touch the closing paren. M-( does both at the same time.

godDLL13y ago

Here are the keys directly under my fingers on the home row: A R S T N E I O

And I can visually see the reasoning behind Colemak being like this, now.

iamwil13y ago

It'd be useful to use the histograms to distinguish between different programming languages for automatic language detections of something like gists.

obtu13y ago

[1] http://softwaremaniacs.org/soft/highlight/en/ http://softwaremaniacs.org/media/soft/highlight/test.html

[2] http://blog.chrislowis.co.uk/2009/01/04/identify-programming...

[3] https://github.com/octplane/go-code-classifier

philwelch13y ago

Interesting how "i" is more common in some languages than others. C and C++ make sense (for(i = 0; i < n; i++)), but Ruby is a puzzler.

riffraff13y ago

"nil" "if" and "elsif"

EDIT: though looking at my sources, where it seems to also be popular, it mostly seems to match inside non-syntax (field, nickname, to_i, strip, index, client).

minikomi13y ago

How about |i|?

1 more reply

jakejake13y ago

I can understand 0 and 1 being frequently used. But I wonder why 5,6 & 7 seem to be under-used compared to the other numbers?

zdw13y ago

I was thinking the same thing until I looked at Perl, which has 4 being quite frequent.

The reason, of course, is that 4 is also $, which is used to denote a scalar in Perl.

Thus, because 5,6,7 correspond to %,^,&, which generally get used to a lesser degree for things like modulo, hashes, exponentiation and logical-and, they're used less.

irahul13y ago

> I can understand 0 and 1 being frequently used. But I wonder why 5,6 & 7 seem to be under-used compared to the other numbers?

The heat map isn't accounting for shift. 5,6,7 also include %,^,&

balbeit13y ago

DeepDuh13y ago

What looks strange to me is that semicolon is is not one of the most common in C-derived languages.

PerryCox13y ago

sosuke13y ago

E is the most common letter in the English language. There was even a fun book written without the e: http://en.wikipedia.org/wiki/Gadsby_(novel)

tikhonj13y ago

1 more reply

waterlesscloud13y ago

At first I thought it would be in most keywords. But looking at one of my Python files (a tiny sample, to be sure), I see that it doesn't occur all that often in the keywords I used.

However, it is in most of my variable names. Given that it's the most common letter in English, that makes sense.

There's the famous phrase "ETAOIN SHRDLU", dating back to printing press days, of the approximate order of the most common letters in English.

Not to be confused with the early AI program "SHRDLU"". :-)

cstavish13y ago

What kind of C programming is this guy doing? '*' is relatively untouched.

boryas13y ago

Guess: maybe all the pointers to structs are typedef-ed away?

AndyKelley13y ago

Dear author: would you consider also generating the results for Dvorak?

ibotty13y ago

and programmer dvorak :D

cpeterso13y ago

I wonder what a programming language designed to minimize shifting would look like. Python does a pretty good job because it uses few curly brackets and no semicolons.

riffraff13y ago

sadly, highly dependent on keyword layout, e.g. my keyboard has square brackets and the equals sign only accessible via a key combination, while the US keyboard does not.

It seems the only safe character across many countries are 0-9a-z.,-\ plus space/tab/return. Not a lot to work with :)

lucian190013y ago

That's part of why I always use a US layout, even on UK keyboards. Also because I've grown up with it.

stigi13y ago

Gotta say, I expected more square brackets for Objective-C.

j / k navigate · click thread line to collapse