It does depend a little on what ratio of additions to multiplies you had. Haswell dropped down to one execution unit capable of floating point addition, so for addition-heavy workloads you basically had to replace half the additions with fma instructions just to keep your old performance from dropping by 2x.