2. Call-by-value gives us the ability to safely interleave effects. Now, I know you all think we should not be doing that at all, but I would say that this is only true in some cases. The point of reifying an effect in a monad is not because "effects are icky"; it is because we have written a non-extensional operation using effects, and we want it to be an extensional function. Wrapping it up in the monad "completes" the operation as such. However, there are plenty of extensional functions which may be written using computational effects (such as memoization): these should not be wrapped up in the monad. (FYI, it's the effects that preserve extensionality which is what Bob calls "benign effects", to the consternation of Haskell developers everywhere.) ML gives us the fine-grainedness to make these choices, at the cost of some reasoning facilities: more proofs must be done on paper, or in an external logical framework. I tend to think that the latter is inevitable, but some disagree.
I am hoping for a middle-ground then: something like ML, in that effects are fundamental and not just bolted onto the language with a stack of monads; something like Haskell, where we can tell what effects are being used by a piece of code.
The story hasn't been fully written on this, but I think that Call-by-push-value can help us with both recovering the benefits of laziness as well as reasoning about effects.
3. Modularity is something which Haskell people simply do not take seriously, even with the new "backpack" package-level module system they are building. One of the most-loved reasoning facilities present in Haskell depends, believe it or not, on global scope, and is therefore inherently anti-modular (this is the uniqueness of type class instances). As a result, you can never add modularity to Haskell, but we may be able to back-port some of the more beloved aspects of Haskell to a new nephew in the ML family.
(Confusingly, laziness advocates often say that their brand of functional programming has better "modularity" than strict, because of the way that you can compose lazy algorithms to get more lazy algorithms that don't totally blow up in complexity. I would say that lazy languages are more "compositional", not more "modular"—I prefer to use the latter term for modularity at the level of architecture and systems design, not algorithms.)