undefined | Better HN

0 pointschubot11y ago0 comments

Right, I was referring to all ML-based languages -- so SML, OCaml, Haskell, F#, and possibly even Rust.

Do you know how this would look in Haskell?

    # Return K most common lines in a file

    def top_k(f, k):
      counts = collections.defaultdict(int)
      for line in f:
        counts[line] += 1

      return sorted(counts.items(), key=lambda x: x[1], reverse=True))[:k]

I was actually looking for the OCaml example which does this. I think it was in "Real World OCaml", and I remember it being horribly ugly compared to Python. I couldn't find it though.

0 comments

mightybyte11y ago

I would do it something like this:

    top xs = map fst $ sortBy (compare `on` Down . snd) $
        M.toList $ foldr (\x -> M.insertWith (+) x 1) mempty xs

This has the type `top :: Ord a => [a] -> [a]`. You might object that I didn't include your k parameter. Because Haskell is a lazy language, I can get your behavior very simply by doing `take n . top` and it will lazily only calculate the first n values. This gives you more abstractive power and lets you write more composable code.

chubotOP11y ago

Yeah, so this sort of proves my point :)

I don't begrudge anyone if they want to use Haskell or OCaml for everything.

But personally I am interested in using ML-like languages for language engineering and not general purpose programming (maybe Rust could change that, but I'm not convinced). I program in 5+ different languages regularly so I've dealt with all the friction involved, mainly by explicitly designing systems as heterogeneous collections of Unix processes.

mightybyte11y ago

Hmmm, I'm not sure what you're saying. What point does it prove? I've been using Haskell for general purpose programming for five years now and I've found it to be exceptionally well-suited.

ufo11y ago

The ocaml version of this code would be actually quite similar, since you can even use imperative hash tables if you want. The syntax is a bit fugly and the stdlib doesn't have some of the helpers python has but I wouldn't go as far as say that its "horribly ugly". Beauty is in the eye of the beholder and Ocaml has the advantage of being much less "magical" than python (as per the "explicit is better than implicit" mantra)

* In ocaml you need to pass some "comparator" parameters around if you want top_k to be polymorphic. Its similar to haskell type classes but you need to be explicit... * Instantiating a polymorphic hash table is a bit ugly but thats more about the standard library... * There is no "default dict" function so you would need to initialize the counts yourself * Looping the file is not hard. Most data structures will have some sort of "fold" or "iter" function you can use. * The sorting function doesn't take keys, so you need to use a comparator instead. Its not that bad though - the function to convert a hash table into a list already returns a list of tuples.

chubotOP11y ago

Meh, I don't buy it. Can you show it?

It's conceptually similar, but just plain uglier in practice. A lot of day to day programming is just FILLED with stuff that is very elegant in Python, and not so much in OCaml.

Language engineering is something is very elegant in OCaml.

In my opinion, EVERY language is a DSL. I stopped looking for the ultimate language -- it doesn't exist.

ufo11y ago

Elegance is subjective... While its undeniable that you often need to type less characters in Python and list/dictionary comprehensions are wonderful syntactic sugar, its kind of nice that there is much less magic going on under the hood in Ocaml. In a highly dynamic language such as Python or Ruby you never know exactly what your code is doing: any method call can can be overloaded and even things like function calls and array indexing can be overloaded too. On the other hand, in Ocaml almost everything is statically dispatched and the static typing is really great (I'm sure Python wouldn't look as nice if you added the required modifications necessary to make it statically types as ocaml is)

While you might not see this dynamism as a big problem in Python, its something that I always found confusing about Ruby. Over there, people everything is a method call and people are super happy to monkeypatch extra methods to classes, which makes it very hard for me to understand the program by reading it. At some point, things get so dynamic that static analysis in your head gets too hard and you are better off just running the program and seeing if it worked...

In any case, I can't give you a concrete example right now because my laptop doesn't have ocaml installed on it but in my other post I already listed some of the bigger differences that the ocaml version has. The tldr is that some of the features you are using in Python aren't available in the ocaml stdlib (so you would need to write the sorting comparator the long way or create the helper function yourself) and that creating the polymorphic hash table is not very beginner friendly (this is a cost we pay for stronger typing). What I wanted to say is that the big things about the Python function (hash tables and generic sorting) are both things you can do in Ocaml just fine.

codygman11y ago

> Do you know how this would look in Haskell?

I'm curious, how did you think it would look in haskell? Can you give a code example of what was in your head?

chubotOP11y ago

I don't know Haskell, so there wasn't anything in my head. I basically know the ML subset of OCaml and I guess Haskell, and that doesn't really apply at all to the top K problem.

I was saying that OCaml was awkward for this problem, and the reply was that OCaml is but Haskell isn't. I'm not convinced after seeing the Haskell code. Python is both semantically and syntactically a better fit for that problem.

codygman11y ago

> I don't know Haskell, so there wasn't anything in my head.

Then what were you impling by saying:

> Do you know how this would look in Haskell?

Because that sentence seems to imply it would be very convoluted in Haskell. How can you imply it would look convoluted in Haskell if as you said:

> I don't know Haskell, so there wasn't anything in my head.

Anyway, onwards...

> I'm not convinced after seeing the Haskell code. Python is both semantically and syntactically a better fit for that problem.

You aren't convinced of what after seeing the Haskell code?

How can you come to the conclusion that Python is both semantically and syntactically a better fit for that problem if you don't know both Python and Haskell?

Can you give me the criteria you are using to come to that conclusion?

Jweb_Guru11y ago

Sorry, but I can't agree with this.

Here is a (more or less) identical implementation in Rust:

    fn top_k(f: Lines, k: usize) -> Vec<(&str, usize)> {
        let mut counts = HashMap::new();
        for line in f {
            *counts.entry(line).get().unwrap_or_else( |v| v.insert(0) ) += 1;
        }
    
        let mut counts = Vec::from_iter(counts);
        counts.sort_by( |x, y| y.1.cmp(&x.1) );
        counts.truncate(k);
        counts
    }

These are the things that make it more verbose than the Python that are not related to ML:

* Curly brace syntax. Not a property of ML languages--Rust and Swift are pretty much alone in the ML family in having curly brace syntax.

* Having to write & explicitly. Not a property of ML languages at all--it's a C++ism. In the type signature, the & is because Rust has first-class references--it deliberately does not abstract over them, unlike nearly every other ML (and most other languages, for that matter). The second case is a stylistic choice: Rust is (somewhat inconsistently) explicit about when you are taking a reference. There have been some proposals to remove this annotation burden.

* Rust doesn't have a find_or_insert method in the standard library (or a "HashMap with Default" a la defaultdict). Not a property of ML languages; in fact, Rust used to have this (see http://web.mit.edu/rust-lang_v0.11/doc/std/collections/hashm...). It was removed because the Entry API is seen as more general. It would be quite easy to add such methods or types back as sugar in a library (if someone hasn't already).

* By convention, Rust APIs that mutate things don't return the original. Not a property of ML languages, just a Rust thing (in Rust, you can get sugar for this a chain! macro, as demonstrated at http://www.reddit.com/r/rust/comments/2wssch/function_chaini...).

* Having to write out ".take(k).truncate" instead of just being able to use slice syntax like [:k] (in Rust, [..k]). This is not an ML thing; the reason you can't do this is because slices in Rust do not create copies, but references into the original vector. If Rust were garbage collected, this would be fine, but in Rust the counts vector would be deallocated at the end of the function, leaving the references dangling. We could clone it, but it wouldn't be any shorter and it would be wasteful compared to just calling truncate.

* Having to specify that the counts be represented as a `Vec` to turn the counts item list into a vector. This is not an ML thing; it is because in Rust, vectors are not privileged over other types of containers. Most MLs have plenty of syntactic sugar for lists; this is again something more C++ish.

* Using '::' instead of '.' as the path separator. Most MLs just use ., this is a C++ism. I believe it is related to Rust's desire to remain LL(1), but I can't remember the particular reason for this one.

* A big one that might seem like an MLism, but isn't: explicitly writing out the types. While MLs vary in how much type inference they actually support, most will infer them. Rust is somewhat unusual in requiring function-level type annotations. In particular, a language with global inference should be able to infer this signature for the function (given in Rust syntax):

    fn top_k<I: IntoIterator>(f: I, k: usize) -> Vec<(I::Item, usize)> where I::Item: Eq + Hash

As far as I can tell, these are the things that make it verbose than the Python that are related to ML:

* Having to write "let" and "let mut". I don't think there is any ML language that doesn't differentiate between variable creation and variable assignment. Similarly, I can't think of a ML-family language that doesn't differentiate between mutable and immutable bindings (however, there was a quite serious proposal by one of the core Rust developers to do this: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focu..., and it could be made less boiler-platey for the common case, as Swift does, by having a separate var keyword, at the cost of losing power while pattern matching).

Thus, here is a hypothetical implementation in an MLish language that switches the non ML-ish Rust-isms to quite reasonable ML-ish alternatives (and has a garbage collector, presumably, but if you don't like that just pretend slice is a copying operation in this language):

    fn top_k(f, k):
        var counts = HashMap.new()
        for line in f:
            counts.find_or_insert(0) += 1

        from_iter(counts).sort_by( |x, y| y.1.cmp(x.1) )[..k]

I guess you could argue that all I've done here is cherry-picked the most Python-like features from all the ML languages (not the shortest, obviously, as the Haskell example demonstrates :)), but my point is that most of the syntactic concerns people associate with ML have little to do with ML itself, but unrelated decisions of the languages in question. Personally, I would love to use a language like the above for similar tasks to what I use Python for, and maybe someday someone will create one :) (perhaps someone already has!)

Jweb_Guru11y ago

Sorry, line four of the hypothetical function should be counts.find_or_insert(line, 0) += 1. Would have edited, but HN won't let you edit after the procrastination settings kick in (but curiously, does let you reply).

j / k navigate · click thread line to collapse

0 comments

mightybyte11y ago

I would do it something like this:

    top xs = map fst $ sortBy (compare `on` Down . snd) $
        M.toList $ foldr (\x -> M.insertWith (+) x 1) mempty xs

chubotOP11y ago

Yeah, so this sort of proves my point :)

I don't begrudge anyone if they want to use Haskell or OCaml for everything.

mightybyte11y ago

Hmmm, I'm not sure what you're saying. What point does it prove? I've been using Haskell for general purpose programming for five years now and I've found it to be exceptionally well-suited.

ufo11y ago

chubotOP11y ago

Meh, I don't buy it. Can you show it?

It's conceptually similar, but just plain uglier in practice. A lot of day to day programming is just FILLED with stuff that is very elegant in Python, and not so much in OCaml.

Language engineering is something is very elegant in OCaml.

In my opinion, EVERY language is a DSL. I stopped looking for the ultimate language -- it doesn't exist.

ufo11y ago

codygman11y ago

> Do you know how this would look in Haskell?

I'm curious, how did you think it would look in haskell? Can you give a code example of what was in your head?

chubotOP11y ago

I don't know Haskell, so there wasn't anything in my head. I basically know the ML subset of OCaml and I guess Haskell, and that doesn't really apply at all to the top K problem.

codygman11y ago

> I don't know Haskell, so there wasn't anything in my head.

Then what were you impling by saying:

> Do you know how this would look in Haskell?

Because that sentence seems to imply it would be very convoluted in Haskell. How can you imply it would look convoluted in Haskell if as you said:

> I don't know Haskell, so there wasn't anything in my head.

Anyway, onwards...

> I'm not convinced after seeing the Haskell code. Python is both semantically and syntactically a better fit for that problem.

You aren't convinced of what after seeing the Haskell code?

How can you come to the conclusion that Python is both semantically and syntactically a better fit for that problem if you don't know both Python and Haskell?

Can you give me the criteria you are using to come to that conclusion?

Jweb_Guru11y ago

Sorry, but I can't agree with this.

Here is a (more or less) identical implementation in Rust:

    fn top_k(f: Lines, k: usize) -> Vec<(&str, usize)> {
        let mut counts = HashMap::new();
        for line in f {
            *counts.entry(line).get().unwrap_or_else( |v| v.insert(0) ) += 1;
        }
    
        let mut counts = Vec::from_iter(counts);
        counts.sort_by( |x, y| y.1.cmp(&x.1) );
        counts.truncate(k);
        counts
    }

These are the things that make it more verbose than the Python that are not related to ML:

* Curly brace syntax. Not a property of ML languages--Rust and Swift are pretty much alone in the ML family in having curly brace syntax.

    fn top_k<I: IntoIterator>(f: I, k: usize) -> Vec<(I::Item, usize)> where I::Item: Eq + Hash

As far as I can tell, these are the things that make it verbose than the Python that are related to ML:

    fn top_k(f, k):
        var counts = HashMap.new()
        for line in f:
            counts.find_or_insert(0) += 1

        from_iter(counts).sort_by( |x, y| y.1.cmp(x.1) )[..k]

Jweb_Guru11y ago

j / k navigate · click thread line to collapse