To give ballpark numbers: modern Intel processors can retire a few instructions per cycle in tight loops (4 is the theoretical maximum; > 2 is realistic in a lot of high-performance code). A branch misprediction wastes 10-15 cycles.
So getting rid of speculation entirely, and stalling on every branch, would waste time equivalent to dozens of instructions. On typical code that has a branch every few instructions, this could slow down execution by several times.