undefined | Better HN

0 pointsRandomThoughts31y ago0 comments

> This is one example why your statement above is not true.

You are misreading my comment. I’m not intentionally contradicting myself in two paragraphs next to each other (I’m not always the brightest but still).

The point is that contrary to what the article states ML developers are not avoiding mutations because they are uneasy to use but because they trust their compiler when they know it will do good. Proof is that in other case they will use mutations when it makes sense to do so because the compiler does not do a good job.

The first paragraph refers to the specific case my comment quotes just before: data structure traversal and storage of elements in a set.

0 comments

derdi1y ago

> The point is that contrary to what the article states ML developers are not avoiding mutations because they are uneasy to use but because they trust their compiler when they know it will do good. Proof is that in other case they will use mutations when it makes sense to do so because the compiler does not do a good job.

It will do a good job, yes. Will it do the best possible job compared to some other algorithm or data structure? It can't. Not in general.

And maybe not in the specific case either:

> The first paragraph refers to the specific case my comment quotes just before: data structure traversal and storage of elements in a set.

So, this: https://ocaml.org/play#code=bW9kdWxlIE15X2R5bmFycmF5ID0gc3Ry...

    $ for len in 5_000_000 10_000_000 25_000_000; do echo "-- ${len} elements --"; ./a.out list $len; ./a.out dynarray $len; ./a.out my_dynarray $len; echo; done
    -- 5_000_000 elements --
    list:        0.191551 sec
    list:        0.196947 sec
    list:        0.192806 sec
    dynarray:    0.301362 sec
    dynarray:    0.268592 sec
    dynarray:    0.266118 sec
    my dynarray: 0.163004 sec
    my dynarray: 0.142986 sec
    my dynarray: 0.143634 sec
    
    -- 10_000_000 elements --
    list:        0.377447 sec
    list:        0.367951 sec
    list:        0.312575 sec
    dynarray:    0.607158 sec
    dynarray:    0.582378 sec
    dynarray:    0.538621 sec
    my dynarray: 0.319705 sec
    my dynarray: 0.296607 sec
    my dynarray: 0.286634 sec
    
    -- 25_000_000 elements --
    list:        0.971244 sec
    list:        0.953493 sec
    list:        0.922049 sec
    dynarray:    1.515892 sec
    dynarray:    1.319543 sec
    dynarray:    1.328461 sec
    my dynarray: 1.119322 sec
    my dynarray: 0.971288 sec
    my dynarray: 0.973556 sec
    
    -- 50_000_000 elements --
    list:        1.852812 sec
    list:        1.848514 sec
    list:        1.505391 sec
    dynarray:    3.065143 sec
    dynarray:    2.941400 sec
    dynarray:    2.672760 sec
    my dynarray: 2.115499 sec
    my dynarray: 1.963535 sec
    my dynarray: 1.995470 sec
    
    -- 75_000_000 elements --
    list:        2.942536 sec
    list:        2.910063 sec
    list:        2.354291 sec
    dynarray:    4.567284 sec
    dynarray:    4.342670 sec
    dynarray:    3.979809 sec
    my dynarray: 2.528073 sec
    my dynarray: 2.225738 sec
    my dynarray: 2.226844 sec

A simple dynamic array implementation (my_dynarray) beats a list over a wide range of lengths. But not at all lengths! OCaml's built-in Dynarray is not competitive, but that's because it wants to make certain strong guarantees.

To be clear, I agree with your general point that we can do just fine writing nice clean pure functional OCaml code for most of our code and can hand-optimize where needed. But your very specific claims rub me the wrong way.

derdi1y ago

Oops, this had a performance bug. Instead of:

    if d.length = Array.length d.values then begin
      d.values <- Array.(append d.values (make (length d.values) x))
    end;

the array reallocation should actually be:

    if d.length = Array.length d.values then begin
      let new_array = Array.make (Array.length d.values * 2) x in
      Array.blit d.values 0 new_array 0 (Array.length d.values);
      d.values <- new_array
    end;

otherwise we allocate about a third more memory than needed. It's telling that even with this performance bug the dynamic array was broadly better than lists.

New results for the previously slowest cases:

    -- 25_000_000 elements --
    list:        0.977002 sec
    list:        0.963903 sec
    list:        0.950473 sec
    dynarray:    1.476165 sec
    dynarray:    1.281724 sec
    dynarray:    1.343222 sec
    my dynarray: 0.872558 sec
    my dynarray: 0.755902 sec
    my dynarray: 0.753746 sec
    
    -- 50_000_000 elements --
    list:        1.914777 sec
    list:        1.886989 sec
    list:        1.542614 sec
    dynarray:    2.922376 sec
    dynarray:    2.783559 sec
    dynarray:    2.537473 sec
    my dynarray: 1.725873 sec
    my dynarray: 1.545252 sec
    my dynarray: 1.515591 sec
    
    -- 75_000_000 elements --
    list:        2.827154 sec
    list:        2.835789 sec
    list:        2.318733 sec
    dynarray:    4.354404 sec
    dynarray:    4.150271 sec
    dynarray:    3.781488 sec
    my dynarray: 1.887360 sec
    my dynarray: 1.929286 sec
    my dynarray: 1.814873 sec

This turns an uneasy head-to-head into a clear win for dynamic arrays. Honestly, how could it be otherwise?

naasking1y ago

> Honestly, how could it be otherwise?

It can't be otherwise if you're just assuming a straightforward compilation model where your written roughly reflects the assembly code that's generated. This just isn't the case with better compilers though.

For instance, Haskell can often perform rewrites or fusion passes that entirely eliminates the loop and all intermediate data structures, effectively giving a near-infinite speedup compared to alternatives. You typically can't perform such optimizations when side-effects are in the middle of the computation, for numerous reasons.

1 more reply

brabel1y ago

But they are still very much in the same order of magnitude... pretty impressive that these solutions are all in the same ballpark, I would've expected much bigger differences.

1 more reply

RandomThoughts3OP1y ago

No, not this. You are doing a map. A traversal of a data collection is not a map but a fold. The exemple in the article is specifically about collection which you would do by consing to a list (mostly free once optimised) in a fold_left (so tail call optimised to a loop). See my other comment.

Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call. You have to use rev_map to get good perf.

derdi1y ago

> A traversal of a data collection is not a map but a fold. The exemple in the article is specifically about collection which you would do by consing to a list (mostly free once optimised) in a fold_left (so tail call optimised to a loop).

The exact wording in the article is "let's say you're iterating over some structure and collecting your results in a sequence". You are interpreting a lot into this. Also, your description is of a map (reversed, and expressed as a fold). Anyway, where is your benchmark?

> Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call.

You are again contradicting yourself. Previously you were praising OCaml's optimization capabilities and now you are questioning them. Specifically in a case where there is a magic optimization that is explicitly motivated by List.map: https://ocaml.org/manual/5.2/tail_mod_cons.html . A magic optimization implemented using, guess what, mutation.

j / k navigate · click thread line to collapse

0 comments

derdi1y ago

It will do a good job, yes. Will it do the best possible job compared to some other algorithm or data structure? It can't. Not in general.

And maybe not in the specific case either:

> The first paragraph refers to the specific case my comment quotes just before: data structure traversal and storage of elements in a set.

So, this: https://ocaml.org/play#code=bW9kdWxlIE15X2R5bmFycmF5ID0gc3Ry...

    $ for len in 5_000_000 10_000_000 25_000_000; do echo "-- ${len} elements --"; ./a.out list $len; ./a.out dynarray $len; ./a.out my_dynarray $len; echo; done
    -- 5_000_000 elements --
    list:        0.191551 sec
    list:        0.196947 sec
    list:        0.192806 sec
    dynarray:    0.301362 sec
    dynarray:    0.268592 sec
    dynarray:    0.266118 sec
    my dynarray: 0.163004 sec
    my dynarray: 0.142986 sec
    my dynarray: 0.143634 sec
    
    -- 10_000_000 elements --
    list:        0.377447 sec
    list:        0.367951 sec
    list:        0.312575 sec
    dynarray:    0.607158 sec
    dynarray:    0.582378 sec
    dynarray:    0.538621 sec
    my dynarray: 0.319705 sec
    my dynarray: 0.296607 sec
    my dynarray: 0.286634 sec
    
    -- 25_000_000 elements --
    list:        0.971244 sec
    list:        0.953493 sec
    list:        0.922049 sec
    dynarray:    1.515892 sec
    dynarray:    1.319543 sec
    dynarray:    1.328461 sec
    my dynarray: 1.119322 sec
    my dynarray: 0.971288 sec
    my dynarray: 0.973556 sec
    
    -- 50_000_000 elements --
    list:        1.852812 sec
    list:        1.848514 sec
    list:        1.505391 sec
    dynarray:    3.065143 sec
    dynarray:    2.941400 sec
    dynarray:    2.672760 sec
    my dynarray: 2.115499 sec
    my dynarray: 1.963535 sec
    my dynarray: 1.995470 sec
    
    -- 75_000_000 elements --
    list:        2.942536 sec
    list:        2.910063 sec
    list:        2.354291 sec
    dynarray:    4.567284 sec
    dynarray:    4.342670 sec
    dynarray:    3.979809 sec
    my dynarray: 2.528073 sec
    my dynarray: 2.225738 sec
    my dynarray: 2.226844 sec

derdi1y ago

Oops, this had a performance bug. Instead of:

    if d.length = Array.length d.values then begin
      d.values <- Array.(append d.values (make (length d.values) x))
    end;

the array reallocation should actually be:

    if d.length = Array.length d.values then begin
      let new_array = Array.make (Array.length d.values * 2) x in
      Array.blit d.values 0 new_array 0 (Array.length d.values);
      d.values <- new_array
    end;

otherwise we allocate about a third more memory than needed. It's telling that even with this performance bug the dynamic array was broadly better than lists.

New results for the previously slowest cases:

    -- 25_000_000 elements --
    list:        0.977002 sec
    list:        0.963903 sec
    list:        0.950473 sec
    dynarray:    1.476165 sec
    dynarray:    1.281724 sec
    dynarray:    1.343222 sec
    my dynarray: 0.872558 sec
    my dynarray: 0.755902 sec
    my dynarray: 0.753746 sec
    
    -- 50_000_000 elements --
    list:        1.914777 sec
    list:        1.886989 sec
    list:        1.542614 sec
    dynarray:    2.922376 sec
    dynarray:    2.783559 sec
    dynarray:    2.537473 sec
    my dynarray: 1.725873 sec
    my dynarray: 1.545252 sec
    my dynarray: 1.515591 sec
    
    -- 75_000_000 elements --
    list:        2.827154 sec
    list:        2.835789 sec
    list:        2.318733 sec
    dynarray:    4.354404 sec
    dynarray:    4.150271 sec
    dynarray:    3.781488 sec
    my dynarray: 1.887360 sec
    my dynarray: 1.929286 sec
    my dynarray: 1.814873 sec

This turns an uneasy head-to-head into a clear win for dynamic arrays. Honestly, how could it be otherwise?

naasking1y ago

> Honestly, how could it be otherwise?

1 more reply

brabel1y ago

But they are still very much in the same order of magnitude... pretty impressive that these solutions are all in the same ballpark, I would've expected much bigger differences.

1 more reply

RandomThoughts3OP1y ago

Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call. You have to use rev_map to get good perf.

derdi1y ago

> Also, List.map is not optimised in Ocaml. It uses a naive implementation and not a tail call.

j / k navigate · click thread line to collapse