It's important to note that this particular job is largely bound on a.) I/O and b.) format serialization tasks. Both Python's BSON and JSON libraries are mature and have their critical sections written in C, so a speedup of 4x is still noteworthy. The Haskell version, on the other hand, is pure Haskell.
/still a Python fan
/also still a python fan :-)
I'd love to point people to this when trying to convey some advantages of Haskell. To make it more compelling, can you expand some on the downsides and maybe obstacles you encountered?
The thing I'm unsure about, is how difficult it would be for (very) talented developers to just jump in. We have really talented developers, and everyone is super time-constrained, so many are wary of diving into a language as different as Haskell. Was it hard for your developers to figure Haskell out? Did your previous use of Scala help? How long did it take them to dive into Scala?
It's all much easier to digest, though, even for "really talented developers", if they have some experience with another functional language first. OCaml is a nice stepping stone before digging into the abstractions involved in understanding Haskell's powerful type system. Scala is good too, but having the object stuff mixed in there can lead you to rely on some patterns that aren't going to be available in a non-OOP language. I think the scheme/clojure path isn't bad either, but it's probably ideal to spend some time in the "statically typed" wing of the functional universe before going to Haskell.
I came to Haskell with no understanding of monads, started writing code, and eventually used my knowledge of Haskell to learn about monads. Not understanding monads just meant I was lacking a useful design pattern, and found certain API docs confusing, but it didn't stop me from writing reasonable code in most circumstances.
On the other hand what you describe in your (awesome) blog post is a more significant Haskell project than any I've worked on, so I'd be interested to hear your experience.
I've not really written my own monad, or properly looked into monad transformer stacks, and I'm aware that I could probably clean up a lot of code using them - is that the sort of thing you mean?
To learn to program purely functional, it's best to jump into Haskell cold-turkey, since you will have to learn to think in FP.
Learning Haskell, optimization in a lazy world was the most difficult task. Often, I still have problems predicting how efficient particular code will be. The complexity of monads is somewhat overstated, though it doesn't help that some tutorials make something big and esoteric out of it. It is nothing more than a type class, that specifies how to combine computations that result in some 'boxed value'.
Haskell as an EDSL for generating hard real time, however, is very viable: http://corp.galois.com/blog/2010/9/22/copilot-a-dsl-for-moni...
Now, if I had stated that all conceivable systems programming domains are addressable with Haskell, that would have indeed been foolish.
What you're probably observing is Python's slow code generation being masked by the inherent slowness of I/O.
Except, when python's pants are on, it makes gold records.
I haven't looked to see if there are any explicit optimizations, but your statement is ridiculous; an effective IO strategy can have an enormous effect on performance.
Any reason why you didn't use Hadoop for this, then run batch jobs to extract summaries?
http://tartarus.org/james/diary/2008/06/17/widefinder-final-...