My take is for small n (say 5 experiments at once) with lots of subjects (>10k participants per branch) and a decent hashing algorithm, the risk of uneven bucketing remains negligible. Is my intuition off?
False positives for experiments is definitely something to keep an eye on. The question to ask is what is our comfort level for trading-off between false positives and velocity. This feels similar to the IRB debate to me, where being too restrictive hurts progress more than it prevents harm.