> If our per-callstack frame is 10k bytes, and we can spend 1 megabyte of stack on recursion (both are pessimistic).. we can go 100 levels deep. Which means we can expect to run into problems when there are more than ~5E29 elements.
This is incorrect.
Your limit is only for fully balanced trees. On a fully unbalanced tree your solution will crash at about 100 elements.
Do we see a lot of unbalanced trees? Yes, most of the time in my experience. If there's a balanced tree, there's probably a data structure and the function is already supplied. Writing a tree traversal function come up when working with unbalanced things like program ASTs, JSON/HTML/XML parsed data, Lisp-style lists, filesystem traversal, etc.
> This assumes we don't write the recursion in a way that the compiler can elide it entirely with tail-call optimization.
The compiler cannot elide away tree traversal with tail-call optimisation. Only one branch.
A really smart compiler could transform it into a loop with an explicit node-stack using an array, or avoid the stack and use in-place pointer-reversal if concurrency rules permit and there's a spare bit. But I've never seem a really smart compiler do either of those things.