> Laziness is delayed execution. That's it. There's _nothing_ stopping you from delaying a side effect.
Laziness is about more than just side effects.
I think "evaluation" or "reduction" would be better words than "execution" here. Laziness (call by need) is an evaluation strategy for (beta-)reducing expressions, which has two nice properties:
- If an expression can be reduced without diverging by some evaluation strategy, then it can be reduced without diverging using call by need.
- Efficiency, in the sense that no duplicated work is performed.
The other common evaluation strategies are call by name and call by value. Call by name has the first property, but not the second; so there are cases when it's exponentially slower than call by need. Call by value has the second property, but not the first, so there are cases when it diverges unnecessarily.
This 'unnecessary divergence' is a major reason why most programming languages end up overly complicated to understand (at least, mathematically). For example, consider something like a pair `(cons x y)`, and its projection functions `car` and `cdr`. We might want to describe their behaviour like this:
∀x. ∀y. (car (cons x y)) = x
∀x. ∀y. (cdr (cons x y)) = y
This is perfectly correct if we're using call by name or call by need, but it's wrong if we're using call by value. Why? Because under call by value `(car (cons x y))` and `(cdr (cons x y))` will diverge if either `x` or `y` diverges. Since the right-hand-sides only contain one variable each, they don't care whether or not the other diverges.
This is why Haskell programs can focus on constructing and destructing data, whilst most other languages must concern themselves with control flow at every point (branching, looping, divergence, delaying, forcing, etc.).