A Large-Scale Study of Programming Languages and Code Quality in GitHub (2014) (opens in new tab)

(cacm.acm.org)

189 pointspk22008y ago66 comments

66 comments

Don't be fooled by the October 2017 date of this article. There should be a 2014 in the title, since this appears to be a re-print of a prior study:

http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf

defined8y ago

Although this is interesting, we should perhaps not hold a high degree of confidence in the results, because the methodology relies on the content of commit messages and the number of commits (if I read this correctly).

- The content of commit messages varies widely in expressiveness and meaningfulness, from 'Fix bug' to a detailed explanation. This confounds the classification of a commit.

- The number of commits can be very misleading depending on the committer's workflow. Some committers merge topic branches to include all their intermediate work in progress commits, which could overrepresent commits flagged as errors. Other committers rebase their topic branches into fewer, or even single, commits before merging. Or, some commits may fix multiple defects.

This kind of analysis is conceptually a worthy endeavor; it would be more meaningful if the metrics it employed were more strongly correlated with the attributes it was trying to analyze.

lqdc138y ago

Also the way one works in the various languages is different. Some people are more likely to push barely working code to github because of the language's culture.

Perhaps timing should be considered as well - how long it takes to implement a feature including fixing its associated bugs.

rixed8y ago

Figure 2 suggests another possible bias favoring functional, managed languages: a lot of errors for C/C++ are related to concurrency and performance. But those are mostly non-bugs for other languages, since when concurrency or performance are a requirement then most of those studied languages would not be considered anyway.

It seems similar to the paradox that makes the best medicine appear to have a lower survival rate just because it's given to most serious patients.

rixed8y ago

Rephrasing to make my point clearer: you open a performance bug against, say, a C program more often than against, say, a python program, because performance is more likely to be a requirement of a C program than of a python program.

Similarly, again for performance reason, your average C program will have more concurrency than your average python program, therefore also more bugs.

Another way to put it: you use C when you have a complex problem to solve and python when you have a simple problem to solve (unless you are a masochist or a purist I suppose). So, one reason those languages have more bugs may just be that the programs themselves are more prone to errors (which might only be slightly related to size, if at all)

closeparen8y ago

>when concurrency or performance are a requirement

I see what you're getting at, but this is an irksome way to put it. We're clearly unwilling to wait for the heat death of the universe for our programs to terminate. Performance is always a requirement.

"When the performance requirements can only be met by C/C++" might be a more accurate formulation, but then it's just tautological.

Java, Go, Obj-C, Erlang, and Scala are all certainly in the running when concurrency is required, and fit within many latency budgets just fine. The managed and dynamic languages on the list are typically used in contexts where latency is dominated by network and disk I/O, so marginal CPU efficiency isn't worth much. That doesn't mean performance isn't a requirement, it means the most effective ways to increase performance are different. Adding indexes, optimizing queries, caching, etc.

zitterbewegung8y ago

I think its more like a level of performance is a requirement. Once you need a higher level of performance C/C++ becomes one of only a few tools you can use. If you need higher then you either go to Fortran or ASM.

conistonwater8y ago

Where do you see this in the paper? They say concurrency errors are mostly the usual things like deadlocks and race conditions, but those absolutely do exist in every language.

Also, what do you mean most of these languages wouldn't be considered when concurrency is required? Concurrency is bog standard everywhere.

It seems like the way the define a bug, a performance bug would be a bug relative to expectations, per project, so you can definitely have a performance bug in Go or Haskell, for example, if something works slower than developers think it should (as opposed to being slower than some external reference code or something). So maybe it's closer to something like "developer control over unexpected underperformance"?

c3534l8y ago

Not even every language in that study supports concurrency, as the study itself points out. I hear a lot of praise for Go because of how much people like doing concurrency with it. The fact that they observed a higher rate of concurrency bugs in Go could just as easily support the interpretation that Go is good for concurrency as it does the interpretation is bad for concurrency.

2 more replies

hellofunk8y ago

Some languages (like Clojure) very significantly reduce the possibility of thread-related bugs. Clojure in particular was designed with multi-threading in mind, so I think it is a fair point that some languages will have more trouble with this than others.

Try multithreading in C++ vs Clojure and the difference in amount of effort is well beyond trivial.

doktrin8y ago

I disagree. Functional languages are well suited for concurrency.

yseuhxndnsj8y ago

I think so too! But the operative question is not how well suited they are to the task. It's how often they're used for the task in the corpus. And on that point I suspect that the person you're replying to has the right of it. I suspect that reaching for concurrency is correlated with a desire for high performance, which in turn I suspect causes people also to reach for these lower level languages.

RickHull8y ago

Uh, where is it shown how "software quality" or "code quality" is measured or determined? Can anyone provide a succinct definition of quality which the paper uses?

As best I can tell, they use commit messages to identify bugfixes, and later they jump to "defective commits". Presumably the bugfix commit is not the defective commit. There is no explanation I can find that shows how they arrive at a defective commit from a bugfix commit.

This specific methodology seems rife with weakness, all of which should be explained clearly and admitted up front.

zzzcpan8y ago

They still have some useful data that you can interpret yourself.

foolfoolz8y ago

curl is an extremely successful tool/library, I would consider it high quality without knowing the ratio of Normal patches to bug fixes. Skyrim is well known to crash and corrupt save games but regarded as one of the greatest RPGs ever made. the game programmers obviously did a lot of things right to produce such a hit. I'm not saying you need to be a worldwide success to write quality code, just that low bug count doesn't always mean high quality and the other way around. quality is measured by user experience

keithnz8y ago

In this case, it's not really how useful/appealling the software is, but if there's some kind of correlation between languages and defects. So if skyrim was rewritten is haskell, then perhaps they'd have less defects.

But looking at the data and their analysis, while there's some interesting stats saying typed and functional languages have a correlation with less defects, there's just too many variables at play.

katastic8y ago

Skyrim actually runs in a VM. So it's already futuristic. And still crashes like a piece of crap written.

I think the 2nd or 3rd STALKER (or all) run in VM as well and... crash and corrupt saves like crazy.

And for those who don't get how it's related. A VM is "supposed" to be pretty damn crash-proof and "safer" much like functional languages. But that doesn't stop deadlines and bad coding practices from creating broken products.

1 more reply

kccqzy8y ago

I’m mildly surprised by how well Clojure performs here. It isn’t statically typed yet fares much better than Haskell/Scala! From my experience Clojure is also a joy to write, sometimes even more than Haskell.

hellofunk8y ago

I have read elsewhere (in many places, in fact, such as the interesting read "Out of the Tar Pit") that the frequency of bugs in a project scales proportionally to the code size. And that this has a greater influence on bugs than features of any particular language.

Clojure is an incredibly succinct language. It uses about half as many lines as Elm, 5% as many lines as C++. I love other languages, but nothing rivals Clojure in elegance. I believe this is the key reason why Clojure projects are so low on bugs -- they are much simpler to maintain, refactor, or rewrite entirely than in most other languages, so fixing problems is not the chore it can be elsewhere.

jstimpfle8y ago

> 5% as many lines as C++

Dude, you need a reality check.

pmarreck8y ago

One only needs to look at RosettaCode for numerous examples

http://rosettacode.org/wiki/Horner%27s_rule_for_polynomial_e...

1 more reply

eklavya8y ago

I was really surprised as well. Maybe Clojure people are really smart :P

I would still take typed over untyped any day.

hellofunk8y ago

It really does just boil down (in my opinion) to code size. A small program is going to have fewer bugs than a large program. And Clojure programs can be quite small compared to the same programs written in other languages.

While I too am attracted to the static typing in many ways, I have found in my own experience of working in C++ full time for a few years (with some non-trivial work in Swift as well, which has quite a strict type system), and then Clojure full time for a few years, that when you measure all the other features of a language, typing alone does not make a big difference.

It's kind of like the argument some people make about how the cost of your rent or mortgage is the best indicator of your cost of living. There are so many other factors, and some of the lowest cost of living I've personally experienced has been in places with considerably higher rent than average, due to the offset of other factors.

1 more reply

yogthos8y ago

I think it's a good starting point to look at a large number of open source projects in the wild. The individual differences in skill, size, etc. average out between them. It's important to establish whether any statistically significant trends exist before anything further can be discussed meaningfully.

If we see empirical evidence that projects written in certain types of languages consistently perform better in a particular area, such as reduction in defects, we can then make a hypothesis as to why that is.

For example, if there was statistical evidence to indicate that using Haskell reduces defects, a hypothesis could be made that the the Haskell type system plays a role here. That hypothesis could then be further tested, and that would tell us whether it's correct or not.

However, this is pretty much the opposite of what happens in discussions about features such as static typing. People state that static typing has benefits and then try to fit the evidence to fit that claim. Even the authors of this study fall into this trap. They read into the preconceived notions that are not supported by the data in their results. The differences they found are so small that it's reasonable to say that the impact of the language is negligible.

runT1ME8y ago

Yes, surely it's a coincidence that two of the most powerful statically typed languages have the least defects.

yogthos8y ago

Perhaps you should actually read the conclusion before getting too excited:

>One should take care not to overestimate the impact of language on defects. While the observed relationships are statistically significant, the effects are quite small. Analysis of deviance reveals that language accounts for less than 1% of the total explained deviance.

Nor do these most powerful statically typed languages appear to perform any better than dynamically typed Clojure and Erlang.

euske8y ago

I think the meta-conclusion that we're getting is that "this kind of study is extremely haarrrd!"

hoytech8y ago

> For example, in languages like Perl, JavaScript, and CoffeeScript adding a string to a number is permissible (e.g., "5" + 2 yields "52"). The same operation yields 7 in Php. Such an operation is not permitted in languages such as Java and Python as they do not allow implicit conversion.

Regarding Perl, the quoted statement is wrong:

    $ perl -E 'say "5" + 2'
    7

Furthermore, this is not an implicit conversion. The + operator is an explicit numeric conversion. Here's a more detailed description:

https://codespeaks.blogspot.ca/2007/09/ruby-and-python-overl...

gipp8y ago

Your link seems to argue that the simple fact that it's common convention makes it explicit, which doesn't really seem to hold water.

hoytech8y ago

Consider this javascript function:

    function add(a,b) { return a + b; }

Although no conversion is requested explicitly in the function definition, a conversion may take place depending on the types of the arguments passed in:

    > add(1,2)
    3
    >add("1",2)
    "12"

The article in question defines implicit conversion in this way, and in my experience it's a fairly common term.

I was pointing out that per this definition, the article is wrong in saying that perl's + operator may perform an implicit conversion. In perl the + operator always performs a numeric conversion of both its operands, regardless of types. By writing + you are explicitly requesting numeric conversion of both arguments.

In general perl doesn't perform implicit conversion (of course there are some exceptions -- it is perl after all). It does this by not overloading operators like + for different operations such as addition and concatenation.

This also has the nice property that you can count on a+b == b+a, unlike python for instance. (However, in python PEP-465, non-commutativity was a stated advantage of adopting @ for matrix multiplication instead of overloading *, go figure).

agrafix8y ago

The + operator is not an explicit numeric conversion. What numeric type is it converting to? In Python you can also combine two lists using the + operator.

Haskell for example requires to be explicit about this:

    y = (read "5" :: Int) + 5

hoytech8y ago

Please see my sibling reply. I was referring to perl, not python or haskell.

j2kun8y ago

They use... varying p-values? Can you do that? It almost looks like they're choosing p after the analysis is done...

s3nnyy8y ago

That is called p-hacking I guess?

carbor18y ago

No, not necessarily. It's standard practice to pick a p-value significance cut-off (0.05), but report the smallest such standard cut-off that any particular value meets. So "p < 0.001" is reported for values that meet that threshold. Anything over the cut-off is just not reported as significant.

2 more replies

davedx8y ago

Fascinating study, but I think a lot of the conclusions in this study are self-evident. For example:

"However when compared to the average, as a group, languages that do not allow implicit type conversion are less error-prone while those that do are more error-prone."

A lot of the conclusions are along these lines: languages with explicit type conversion have less [type conversion] errors. Well, of course...

Still worth a read though, and makes a strong case for functional, statically-typed languages.

rbehrends8y ago

> Still worth a read though, and makes a strong case for functional, statically-typed languages.

The thing is, it really doesn't. There are too many inexplicable results. Typescript does significantly worse than Javascript, for example. There's also no real good explanation why the results for Ruby and Python are diametrically opposite, basically (the languages are more alike than different). And Clojure has the best result of them all.

I suspect that there are simply too many confounding variables that are not accounted for (such as the typical application domains for those languages, average programmer skill, or complexity of the problems being targeted by these projects).

davedx8y ago

Yes, I think after reading to the end I agree with your summary.

I still think there is value in using languages that eliminate entire classes of bugs though, for example using a language that has automatic memory management is a no-brainer except for certain specific domains where you need to do memory management yourself. Likewise with static typing: it eliminates type bugs. There have definitely been times for me recently when working with a dynamic language like JavaScript and there's been a bug in our code base that would not have happened had we been using TypeScript. Some of these bugs also had significant business impacts.

There is of course a trade off, typed languages can be more challenging to develop with: I've had a number of fights with the Scala compiler. Typically it's libraries rather than the base language, but it still costs time I wouldn't have spent if using a dynamic language. Also, the Scala compiler itself is very slow, to the point where the XKCD comic about "code's compiling" has been true. On modern Macbook Pros, this shouldn't be a thing anymore, but it still is :)

1 more reply

mannykannot8y ago

> A lot of the conclusions are along these lines: languages with explicit type conversion have less [type conversion] errors. Well, of course...

Well, of course, indeed... There is a bogus argument here, and it is not in that part of the study which is being ridiculed. Modifying the statement you are arguing against is often an indication that something isn't quite right.

Of course, what matters here is the overall error rate, possibly weighted for severity (though trying to do that is itself problematic), on comparable tasks. In a rational world, anything that can eliminate an important class of error, without making corresponding increases elsewhere, would be regarded as a success.

PleaseHelpMe8y ago

Summary :

> The data indicates that functional languages are better than procedural languages; it suggests that disallowing implicit type conversion is better than allowing it; that static typing is better than dynamic; and that managed memory usage is better than unmanaged. Further, that the defect proneness of languages in general is not associated with software domains. Additionally, languages are more related to individual bug categories than bugs overall.

kps8y ago

But also:

> It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.

hellofunk8y ago

This article is dated October 2017 and claims to be the first large-scale evidentiary study. But I have definitely seen either this exact study or another one nearly identical, also using GitHub and also having similar results for the languages, and that was at least 1 year ago. So perhaps this article is a re-print of a prior study?

mrmeh8y ago

That's how CACM research highlights work. A published paper is invited to be featured as a highlight. The paper is edited somewhat, generally it's made smaller with some details removed. The edited paper will then appear some time later along with a discussion of the paper written by someone not associated with the paper. If the paper is controversial, as this one is, a long time may pass between the original date of publication and the publication of the CACM research highlight.

lomnakkus8y ago

Here you go: http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf

Seems to be from 2014 and has the exact same list of authors, I think. (Hosted in ~filkov which I'm assuming is the www_public of Vladimir Filkov.)

No idea what changed since then.

hellofunk8y ago

Not just the same list of authors; the article and this PDF have nearly the same verbatim wording.

dang8y ago

I was wondering that too, because of https://hn.algolia.com/?query=A%20Large-Scale%20Study%20of%2.... I think we'll put 2014 above.

Edit: two in one night! https://news.ycombinator.com/item?id=15382275

neilwilson8y ago

Interesting that the social element isn't mentioned.

Smarter programmers are likely to be able to get their head around the strict requirements of functional languages and they are the ones using the languages at the moment.

Java, on the other hand, is pretty much the COBOL of this generation.

ismail8y ago

Maybe I am missing something thing key. But that sounds like an assumption? Is there any evidence/details you can point me to?

banachtarski8y ago

There's evidence but not the type you're looking for. The op has provided evidence that the op uses functional languages and not Java.

tom_mellior8y ago

> Interesting that the social element isn't mentioned.

It's right there in the abstract: There might be "other, intangible process factors, for example, the preference of certain personality types for functional, static languages that disallow type confusion."

hellofunk8y ago

There are several points made about typing in another related post here yesterday:

https://news.ycombinator.com/item?id=15378800

monster2control8y ago

Basically, they didn't figure out shit. What a waste of a read.

j / k navigate · click thread line to collapse

66 comments

hellofunk8y ago

Don't be fooled by the October 2017 date of this article. There should be a 2014 in the title, since this appears to be a re-print of a prior study:

http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf

defined8y ago

- The content of commit messages varies widely in expressiveness and meaningfulness, from 'Fix bug' to a detailed explanation. This confounds the classification of a commit.

This kind of analysis is conceptually a worthy endeavor; it would be more meaningful if the metrics it employed were more strongly correlated with the attributes it was trying to analyze.

lqdc138y ago

Also the way one works in the various languages is different. Some people are more likely to push barely working code to github because of the language's culture.

Perhaps timing should be considered as well - how long it takes to implement a feature including fixing its associated bugs.

rixed8y ago

It seems similar to the paradox that makes the best medicine appear to have a lower survival rate just because it's given to most serious patients.

rixed8y ago

Similarly, again for performance reason, your average C program will have more concurrency than your average python program, therefore also more bugs.

closeparen8y ago

>when concurrency or performance are a requirement

"When the performance requirements can only be met by C/C++" might be a more accurate formulation, but then it's just tautological.

zitterbewegung8y ago

conistonwater8y ago

Where do you see this in the paper? They say concurrency errors are mostly the usual things like deadlocks and race conditions, but those absolutely do exist in every language.

Also, what do you mean most of these languages wouldn't be considered when concurrency is required? Concurrency is bog standard everywhere.

c3534l8y ago

2 more replies

hellofunk8y ago

Try multithreading in C++ vs Clojure and the difference in amount of effort is well beyond trivial.

doktrin8y ago

I disagree. Functional languages are well suited for concurrency.

yseuhxndnsj8y ago

RickHull8y ago

Uh, where is it shown how "software quality" or "code quality" is measured or determined? Can anyone provide a succinct definition of quality which the paper uses?

This specific methodology seems rife with weakness, all of which should be explained clearly and admitted up front.

zzzcpan8y ago

They still have some useful data that you can interpret yourself.

foolfoolz8y ago

keithnz8y ago

But looking at the data and their analysis, while there's some interesting stats saying typed and functional languages have a correlation with less defects, there's just too many variables at play.

katastic8y ago

Skyrim actually runs in a VM. So it's already futuristic. And still crashes like a piece of crap written.

I think the 2nd or 3rd STALKER (or all) run in VM as well and... crash and corrupt saves like crazy.

1 more reply

kccqzy8y ago

hellofunk8y ago

jstimpfle8y ago

> 5% as many lines as C++

Dude, you need a reality check.

pmarreck8y ago

One only needs to look at RosettaCode for numerous examples

http://rosettacode.org/wiki/Horner%27s_rule_for_polynomial_e...

1 more reply

eklavya8y ago

I was really surprised as well. Maybe Clojure people are really smart :P

I would still take typed over untyped any day.

hellofunk8y ago

1 more reply

yogthos8y ago

runT1ME8y ago

Yes, surely it's a coincidence that two of the most powerful statically typed languages have the least defects.

yogthos8y ago

Perhaps you should actually read the conclusion before getting too excited:

Nor do these most powerful statically typed languages appear to perform any better than dynamically typed Clojure and Erlang.

euske8y ago

I think the meta-conclusion that we're getting is that "this kind of study is extremely haarrrd!"

hoytech8y ago

Regarding Perl, the quoted statement is wrong:

    $ perl -E 'say "5" + 2'
    7

Furthermore, this is not an implicit conversion. The + operator is an explicit numeric conversion. Here's a more detailed description:

https://codespeaks.blogspot.ca/2007/09/ruby-and-python-overl...

gipp8y ago

Your link seems to argue that the simple fact that it's common convention makes it explicit, which doesn't really seem to hold water.

hoytech8y ago

Consider this javascript function:

    function add(a,b) { return a + b; }

Although no conversion is requested explicitly in the function definition, a conversion may take place depending on the types of the arguments passed in:

    > add(1,2)
    3
    >add("1",2)
    "12"

The article in question defines implicit conversion in this way, and in my experience it's a fairly common term.

agrafix8y ago

The + operator is not an explicit numeric conversion. What numeric type is it converting to? In Python you can also combine two lists using the + operator.

Haskell for example requires to be explicit about this:

    y = (read "5" :: Int) + 5

hoytech8y ago

Please see my sibling reply. I was referring to perl, not python or haskell.

j2kun8y ago

They use... varying p-values? Can you do that? It almost looks like they're choosing p after the analysis is done...

s3nnyy8y ago

That is called p-hacking I guess?

carbor18y ago

2 more replies

davedx8y ago

Fascinating study, but I think a lot of the conclusions in this study are self-evident. For example:

"However when compared to the average, as a group, languages that do not allow implicit type conversion are less error-prone while those that do are more error-prone."

A lot of the conclusions are along these lines: languages with explicit type conversion have less [type conversion] errors. Well, of course...

Still worth a read though, and makes a strong case for functional, statically-typed languages.

rbehrends8y ago

> Still worth a read though, and makes a strong case for functional, statically-typed languages.

davedx8y ago

Yes, I think after reading to the end I agree with your summary.

1 more reply

mannykannot8y ago

> A lot of the conclusions are along these lines: languages with explicit type conversion have less [type conversion] errors. Well, of course...

PleaseHelpMe8y ago

Summary :

kps8y ago

But also:

> It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.

hellofunk8y ago

mrmeh8y ago

lomnakkus8y ago

Here you go: http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf

Seems to be from 2014 and has the exact same list of authors, I think. (Hosted in ~filkov which I'm assuming is the www_public of Vladimir Filkov.)

No idea what changed since then.

hellofunk8y ago

Not just the same list of authors; the article and this PDF have nearly the same verbatim wording.

dang8y ago

I was wondering that too, because of https://hn.algolia.com/?query=A%20Large-Scale%20Study%20of%2.... I think we'll put 2014 above.

Edit: two in one night! https://news.ycombinator.com/item?id=15382275

neilwilson8y ago

Interesting that the social element isn't mentioned.

Smarter programmers are likely to be able to get their head around the strict requirements of functional languages and they are the ones using the languages at the moment.

Java, on the other hand, is pretty much the COBOL of this generation.

ismail8y ago

Maybe I am missing something thing key. But that sounds like an assumption? Is there any evidence/details you can point me to?

banachtarski8y ago

There's evidence but not the type you're looking for. The op has provided evidence that the op uses functional languages and not Java.

tom_mellior8y ago

> Interesting that the social element isn't mentioned.

hellofunk8y ago

There are several points made about typing in another related post here yesterday:

https://news.ycombinator.com/item?id=15378800

monster2control8y ago

Basically, they didn't figure out shit. What a waste of a read.

j / k navigate · click thread line to collapse