undefined | Better HN

0 pointssteveklabnik2y ago0 comments

I am gonna be honest: I hope this paper never gets cited by anyone, ever. There's a number of very weird issues about it, but I don't think what it actually shows is demonstrative of reality, even if it happens to show Rust in a good light.

0 comments

odyssey72y ago

I’d love to know more. Anything you can point to?

steveklabnikOP2y ago

A couple of things off the top of my head:

The title conflates languages and their implementations. Different implementations prioritize different things. They occasionally do test different implementations, as in the main Ruby distribution vs JRuby, but it is still annoying.

The second, and I think largest issue, is that they chose the Language Benchmarks Game as the set of sample programs to test. I do not believe that the kinds of programs in the Language Benchmarks Game are representative of the broader set of software written in most languages. They tend towards math-y, puzzle-style programs, and not CLIs, web applications, GUIs, or anything else.

A very specific issue I have is that Typescript and JavaScript are very different in their analysis, and that's very confusing to me, given that all JavaScript is valid TypeScript, and you would execute it in the same way. This may be an artifact of issue #2, which is that the benchmarks game is only as good as the people who wrote the programs, and it's quite possible that the folks who submitted the TypeScript code didn't do as much perf work as the JavaScript code, but it is still a confusing result that's not explained anywhere in the paper.

A final one (and this is the one I remember least well, so I may be wrong here) is that it is not reproducible. They do not mention which date they retrieved the programs from the Benchmarks Game, let alone the source code of the program, nor released the scripts that were used to collect the data, though they describe them. This means that these discrepancies are hard to actually investigate, and makes the results lower quality than if we were able to independently verify the results, let alone update them based on what has changed since 2017, which is an increasingly long time ago.

In short, I do not think this paper is literally useless, though I think that it does not actually demonstrate its central claim very well, and is difficult to evaluate the actual quality of the results, making it a far weaker result than the title would suggest.

igouy2y ago

> conflates languages and their implementations

A more charitable reading might accept that language names may be used as shorthand for particular language implementations.

In this case:

https://sites.google.com/view/energy-efficiency-languages/se...

> representative of the broader set of software written in most languages

To your knowledge, did such a collection of programs — actually shown to meet that criterion — exist?

> Typescript and JavaScript are very different in their analysis

When we emphasize outliers with arithmetic means in "Table 4. Normalized global results for Energy, Time, and Memory".

With medians:

    JS 7.25 times slower than C
    TS 7.8 times slower than C

    ~

> all JavaScript is valid TypeScript

Except `--alwaysStrict` and `--use_strict`

So a JavaScript program may have failed as a TypeScript program, and a different program which worked as TypeScript may have been measured.

> not reproducible

The authors provided a repo, including test program source code, that is still available 5 years later.

page 3, footnote 1 "The measuring framework and the complete set of results are publicly available at https://sites.google.com/view/energy-efficiency-languages"

1 more reply

odyssey72y ago

Thanks for writing all of that out! I appreciate the analysis.

1 more reply

j / k navigate · click thread line to collapse

0 comments

odyssey72y ago

I’d love to know more. Anything you can point to?

steveklabnikOP2y ago

A couple of things off the top of my head:

igouy2y ago

> conflates languages and their implementations

A more charitable reading might accept that language names may be used as shorthand for particular language implementations.

In this case:

https://sites.google.com/view/energy-efficiency-languages/se...

> representative of the broader set of software written in most languages

To your knowledge, did such a collection of programs — actually shown to meet that criterion — exist?

> Typescript and JavaScript are very different in their analysis

When we emphasize outliers with arithmetic means in "Table 4. Normalized global results for Energy, Time, and Memory".

With medians:

    JS 7.25 times slower than C
    TS 7.8 times slower than C

    ~

> all JavaScript is valid TypeScript

Except `--alwaysStrict` and `--use_strict`

So a JavaScript program may have failed as a TypeScript program, and a different program which worked as TypeScript may have been measured.

> not reproducible

The authors provided a repo, including test program source code, that is still available 5 years later.

page 3, footnote 1 "The measuring framework and the complete set of results are publicly available at https://sites.google.com/view/energy-efficiency-languages"

1 more reply

odyssey72y ago

Thanks for writing all of that out! I appreciate the analysis.

1 more reply

j / k navigate · click thread line to collapse