Do you know how this would look in Haskell?
# Return K most common lines in a file
def top_k(f, k):
counts = collections.defaultdict(int)
for line in f:
counts[line] += 1
return sorted(counts.items(), key=lambda x: x[1], reverse=True))[:k]
I was actually looking for the OCaml example which does this. I think it was in "Real World OCaml", and I remember it being horribly ugly compared to Python. I couldn't find it though. top xs = map fst $ sortBy (compare `on` Down . snd) $
M.toList $ foldr (\x -> M.insertWith (+) x 1) mempty xs
This has the type `top :: Ord a => [a] -> [a]`. You might object that I didn't include your k parameter. Because Haskell is a lazy language, I can get your behavior very simply by doing `take n . top` and it will lazily only calculate the first n values. This gives you more abstractive power and lets you write more composable code.I don't begrudge anyone if they want to use Haskell or OCaml for everything.
But personally I am interested in using ML-like languages for language engineering and not general purpose programming (maybe Rust could change that, but I'm not convinced). I program in 5+ different languages regularly so I've dealt with all the friction involved, mainly by explicitly designing systems as heterogeneous collections of Unix processes.
* In ocaml you need to pass some "comparator" parameters around if you want top_k to be polymorphic. Its similar to haskell type classes but you need to be explicit... * Instantiating a polymorphic hash table is a bit ugly but thats more about the standard library... * There is no "default dict" function so you would need to initialize the counts yourself * Looping the file is not hard. Most data structures will have some sort of "fold" or "iter" function you can use. * The sorting function doesn't take keys, so you need to use a comparator instead. Its not that bad though - the function to convert a hash table into a list already returns a list of tuples.
It's conceptually similar, but just plain uglier in practice. A lot of day to day programming is just FILLED with stuff that is very elegant in Python, and not so much in OCaml.
Language engineering is something is very elegant in OCaml.
In my opinion, EVERY language is a DSL. I stopped looking for the ultimate language -- it doesn't exist.
While you might not see this dynamism as a big problem in Python, its something that I always found confusing about Ruby. Over there, people everything is a method call and people are super happy to monkeypatch extra methods to classes, which makes it very hard for me to understand the program by reading it. At some point, things get so dynamic that static analysis in your head gets too hard and you are better off just running the program and seeing if it worked...
In any case, I can't give you a concrete example right now because my laptop doesn't have ocaml installed on it but in my other post I already listed some of the bigger differences that the ocaml version has. The tldr is that some of the features you are using in Python aren't available in the ocaml stdlib (so you would need to write the sorting comparator the long way or create the helper function yourself) and that creating the polymorphic hash table is not very beginner friendly (this is a cost we pay for stronger typing). What I wanted to say is that the big things about the Python function (hash tables and generic sorting) are both things you can do in Ocaml just fine.
I'm curious, how did you think it would look in haskell? Can you give a code example of what was in your head?
I was saying that OCaml was awkward for this problem, and the reply was that OCaml is but Haskell isn't. I'm not convinced after seeing the Haskell code. Python is both semantically and syntactically a better fit for that problem.
Then what were you impling by saying:
> Do you know how this would look in Haskell?
Because that sentence seems to imply it would be very convoluted in Haskell. How can you imply it would look convoluted in Haskell if as you said:
> I don't know Haskell, so there wasn't anything in my head.
Anyway, onwards...
> I'm not convinced after seeing the Haskell code. Python is both semantically and syntactically a better fit for that problem.
You aren't convinced of what after seeing the Haskell code?
How can you come to the conclusion that Python is both semantically and syntactically a better fit for that problem if you don't know both Python and Haskell?
Can you give me the criteria you are using to come to that conclusion?
Here is a (more or less) identical implementation in Rust:
fn top_k(f: Lines, k: usize) -> Vec<(&str, usize)> {
let mut counts = HashMap::new();
for line in f {
*counts.entry(line).get().unwrap_or_else( |v| v.insert(0) ) += 1;
}
let mut counts = Vec::from_iter(counts);
counts.sort_by( |x, y| y.1.cmp(&x.1) );
counts.truncate(k);
counts
}
These are the things that make it more verbose than the Python that are not related to ML:* Curly brace syntax. Not a property of ML languages--Rust and Swift are pretty much alone in the ML family in having curly brace syntax.
* Having to write & explicitly. Not a property of ML languages at all--it's a C++ism. In the type signature, the & is because Rust has first-class references--it deliberately does not abstract over them, unlike nearly every other ML (and most other languages, for that matter). The second case is a stylistic choice: Rust is (somewhat inconsistently) explicit about when you are taking a reference. There have been some proposals to remove this annotation burden.
* Rust doesn't have a find_or_insert method in the standard library (or a "HashMap with Default" a la defaultdict). Not a property of ML languages; in fact, Rust used to have this (see http://web.mit.edu/rust-lang_v0.11/doc/std/collections/hashm...). It was removed because the Entry API is seen as more general. It would be quite easy to add such methods or types back as sugar in a library (if someone hasn't already).
* By convention, Rust APIs that mutate things don't return the original. Not a property of ML languages, just a Rust thing (in Rust, you can get sugar for this a chain! macro, as demonstrated at http://www.reddit.com/r/rust/comments/2wssch/function_chaini...).
* Having to write out ".take(k).truncate" instead of just being able to use slice syntax like [:k] (in Rust, [..k]). This is not an ML thing; the reason you can't do this is because slices in Rust do not create copies, but references into the original vector. If Rust were garbage collected, this would be fine, but in Rust the counts vector would be deallocated at the end of the function, leaving the references dangling. We could clone it, but it wouldn't be any shorter and it would be wasteful compared to just calling truncate.
* Having to specify that the counts be represented as a `Vec` to turn the counts item list into a vector. This is not an ML thing; it is because in Rust, vectors are not privileged over other types of containers. Most MLs have plenty of syntactic sugar for lists; this is again something more C++ish.
* Using '::' instead of '.' as the path separator. Most MLs just use ., this is a C++ism. I believe it is related to Rust's desire to remain LL(1), but I can't remember the particular reason for this one.
* A big one that might seem like an MLism, but isn't: explicitly writing out the types. While MLs vary in how much type inference they actually support, most will infer them. Rust is somewhat unusual in requiring function-level type annotations. In particular, a language with global inference should be able to infer this signature for the function (given in Rust syntax):
fn top_k<I: IntoIterator>(f: I, k: usize) -> Vec<(I::Item, usize)> where I::Item: Eq + Hash
As far as I can tell, these are the things that make it verbose than the Python that are related to ML:* Having to write "let" and "let mut". I don't think there is any ML language that doesn't differentiate between variable creation and variable assignment. Similarly, I can't think of a ML-family language that doesn't differentiate between mutable and immutable bindings (however, there was a quite serious proposal by one of the core Rust developers to do this: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focu..., and it could be made less boiler-platey for the common case, as Swift does, by having a separate var keyword, at the cost of losing power while pattern matching).
Thus, here is a hypothetical implementation in an MLish language that switches the non ML-ish Rust-isms to quite reasonable ML-ish alternatives (and has a garbage collector, presumably, but if you don't like that just pretend slice is a copying operation in this language):
fn top_k(f, k):
var counts = HashMap.new()
for line in f:
counts.find_or_insert(0) += 1
from_iter(counts).sort_by( |x, y| y.1.cmp(x.1) )[..k]
I guess you could argue that all I've done here is cherry-picked the most Python-like features from all the ML languages (not the shortest, obviously, as the Haskell example demonstrates :)), but my point is that most of the syntactic concerns people associate with ML have little to do with ML itself, but unrelated decisions of the languages in question. Personally, I would love to use a language like the above for similar tasks to what I use Python for, and maybe someday someone will create one :) (perhaps someone already has!)