Each new output value i can collide with output 1, 2, ..., i-1. So the collision probability of iteration i is (i-1)/2^256. Adding all of the iterations up you have 1/2^256 + 2/2^256 + ... + (i-1)/2^256 = 0.5 i (i-1)/2^256 which approaches 1/2 as you get to 2^128.
In reality you use some variant of Rho which does not store every previous value but uses something like Floyd's cycle detection or distinguished points and requires minimal storage, at the cost of some extra hash computations but still on the order of 2^128.