- The content of commit messages varies widely in expressiveness and meaningfulness, from 'Fix bug' to a detailed explanation. This confounds the classification of a commit.
- The number of commits can be very misleading depending on the committer's workflow. Some committers merge topic branches to include all their intermediate work in progress commits, which could overrepresent commits flagged as errors. Other committers rebase their topic branches into fewer, or even single, commits before merging. Or, some commits may fix multiple defects.
This kind of analysis is conceptually a worthy endeavor; it would be more meaningful if the metrics it employed were more strongly correlated with the attributes it was trying to analyze.
Perhaps timing should be considered as well - how long it takes to implement a feature including fixing its associated bugs.
It seems similar to the paradox that makes the best medicine appear to have a lower survival rate just because it's given to most serious patients.
Similarly, again for performance reason, your average C program will have more concurrency than your average python program, therefore also more bugs.
Another way to put it: you use C when you have a complex problem to solve and python when you have a simple problem to solve (unless you are a masochist or a purist I suppose). So, one reason those languages have more bugs may just be that the programs themselves are more prone to errors (which might only be slightly related to size, if at all)
I see what you're getting at, but this is an irksome way to put it. We're clearly unwilling to wait for the heat death of the universe for our programs to terminate. Performance is always a requirement.
"When the performance requirements can only be met by C/C++" might be a more accurate formulation, but then it's just tautological.
Java, Go, Obj-C, Erlang, and Scala are all certainly in the running when concurrency is required, and fit within many latency budgets just fine. The managed and dynamic languages on the list are typically used in contexts where latency is dominated by network and disk I/O, so marginal CPU efficiency isn't worth much. That doesn't mean performance isn't a requirement, it means the most effective ways to increase performance are different. Adding indexes, optimizing queries, caching, etc.
Also, what do you mean most of these languages wouldn't be considered when concurrency is required? Concurrency is bog standard everywhere.
It seems like the way the define a bug, a performance bug would be a bug relative to expectations, per project, so you can definitely have a performance bug in Go or Haskell, for example, if something works slower than developers think it should (as opposed to being slower than some external reference code or something). So maybe it's closer to something like "developer control over unexpected underperformance"?
Try multithreading in C++ vs Clojure and the difference in amount of effort is well beyond trivial.
As best I can tell, they use commit messages to identify bugfixes, and later they jump to "defective commits". Presumably the bugfix commit is not the defective commit. There is no explanation I can find that shows how they arrive at a defective commit from a bugfix commit.
This specific methodology seems rife with weakness, all of which should be explained clearly and admitted up front.
But looking at the data and their analysis, while there's some interesting stats saying typed and functional languages have a correlation with less defects, there's just too many variables at play.
I think the 2nd or 3rd STALKER (or all) run in VM as well and... crash and corrupt saves like crazy.
And for those who don't get how it's related. A VM is "supposed" to be pretty damn crash-proof and "safer" much like functional languages. But that doesn't stop deadlines and bad coding practices from creating broken products.
Clojure is an incredibly succinct language. It uses about half as many lines as Elm, 5% as many lines as C++. I love other languages, but nothing rivals Clojure in elegance. I believe this is the key reason why Clojure projects are so low on bugs -- they are much simpler to maintain, refactor, or rewrite entirely than in most other languages, so fixing problems is not the chore it can be elsewhere.
I would still take typed over untyped any day.
While I too am attracted to the static typing in many ways, I have found in my own experience of working in C++ full time for a few years (with some non-trivial work in Swift as well, which has quite a strict type system), and then Clojure full time for a few years, that when you measure all the other features of a language, typing alone does not make a big difference.
It's kind of like the argument some people make about how the cost of your rent or mortgage is the best indicator of your cost of living. There are so many other factors, and some of the lowest cost of living I've personally experienced has been in places with considerably higher rent than average, due to the offset of other factors.
If we see empirical evidence that projects written in certain types of languages consistently perform better in a particular area, such as reduction in defects, we can then make a hypothesis as to why that is.
For example, if there was statistical evidence to indicate that using Haskell reduces defects, a hypothesis could be made that the the Haskell type system plays a role here. That hypothesis could then be further tested, and that would tell us whether it's correct or not.
However, this is pretty much the opposite of what happens in discussions about features such as static typing. People state that static typing has benefits and then try to fit the evidence to fit that claim. Even the authors of this study fall into this trap. They read into the preconceived notions that are not supported by the data in their results. The differences they found are so small that it's reasonable to say that the impact of the language is negligible.
>One should take care not to overestimate the impact of language on defects. While the observed relationships are statistically significant, the effects are quite small. Analysis of deviance reveals that language accounts for less than 1% of the total explained deviance.
Nor do these most powerful statically typed languages appear to perform any better than dynamically typed Clojure and Erlang.
Regarding Perl, the quoted statement is wrong:
$ perl -E 'say "5" + 2'
7
Furthermore, this is not an implicit conversion. The + operator is an explicit numeric conversion. Here's a more detailed description:https://codespeaks.blogspot.ca/2007/09/ruby-and-python-overl...
function add(a,b) { return a + b; }
Although no conversion is requested explicitly in the function definition, a conversion may take place depending on the types of the arguments passed in: > add(1,2)
3
>add("1",2)
"12"
The article in question defines implicit conversion in this way, and in my experience it's a fairly common term.I was pointing out that per this definition, the article is wrong in saying that perl's + operator may perform an implicit conversion. In perl the + operator always performs a numeric conversion of both its operands, regardless of types. By writing + you are explicitly requesting numeric conversion of both arguments.
In general perl doesn't perform implicit conversion (of course there are some exceptions -- it is perl after all). It does this by not overloading operators like + for different operations such as addition and concatenation.
This also has the nice property that you can count on a+b == b+a, unlike python for instance. (However, in python PEP-465, non-commutativity was a stated advantage of adopting @ for matrix multiplication instead of overloading *, go figure).
Haskell for example requires to be explicit about this:
y = (read "5" :: Int) + 5"However when compared to the average, as a group, languages that do not allow implicit type conversion are less error-prone while those that do are more error-prone."
A lot of the conclusions are along these lines: languages with explicit type conversion have less [type conversion] errors. Well, of course...
Still worth a read though, and makes a strong case for functional, statically-typed languages.
The thing is, it really doesn't. There are too many inexplicable results. Typescript does significantly worse than Javascript, for example. There's also no real good explanation why the results for Ruby and Python are diametrically opposite, basically (the languages are more alike than different). And Clojure has the best result of them all.
I suspect that there are simply too many confounding variables that are not accounted for (such as the typical application domains for those languages, average programmer skill, or complexity of the problems being targeted by these projects).
I still think there is value in using languages that eliminate entire classes of bugs though, for example using a language that has automatic memory management is a no-brainer except for certain specific domains where you need to do memory management yourself. Likewise with static typing: it eliminates type bugs. There have definitely been times for me recently when working with a dynamic language like JavaScript and there's been a bug in our code base that would not have happened had we been using TypeScript. Some of these bugs also had significant business impacts.
There is of course a trade off, typed languages can be more challenging to develop with: I've had a number of fights with the Scala compiler. Typically it's libraries rather than the base language, but it still costs time I wouldn't have spent if using a dynamic language. Also, the Scala compiler itself is very slow, to the point where the XKCD comic about "code's compiling" has been true. On modern Macbook Pros, this shouldn't be a thing anymore, but it still is :)
Well, of course, indeed... There is a bogus argument here, and it is not in that part of the study which is being ridiculed. Modifying the statement you are arguing against is often an indication that something isn't quite right.
Of course, what matters here is the overall error rate, possibly weighted for severity (though trying to do that is itself problematic), on comparable tasks. In a rational world, anything that can eliminate an important class of error, without making corresponding increases elsewhere, would be regarded as a success.
> The data indicates that functional languages are better than procedural languages; it suggests that disallowing implicit type conversion is better than allowing it; that static typing is better than dynamic; and that managed memory usage is better than unmanaged. Further, that the defect proneness of languages in general is not associated with software domains. Additionally, languages are more related to individual bug categories than bugs overall.
> It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.
Seems to be from 2014 and has the exact same list of authors, I think. (Hosted in ~filkov which I'm assuming is the www_public of Vladimir Filkov.)
No idea what changed since then.
Edit: two in one night! https://news.ycombinator.com/item?id=15382275
Smarter programmers are likely to be able to get their head around the strict requirements of functional languages and they are the ones using the languages at the moment.
Java, on the other hand, is pretty much the COBOL of this generation.
It's right there in the abstract: There might be "other, intangible process factors, for example, the preference of certain personality types for functional, static languages that disallow type confusion."