No one writes 10000 line programs, either imperative, or functional, without testing and running smaller parts of it. At least no one that I know of.
I have dealt with ML while doing my MSc in software engineering, and wrote my dissertation on it.
I didn't see any improvement regarding bugs and state.
What I have discovered in functional languages is that while in fp you are not required to keep track of mutable states, you are required to keep track of values passed in parameters.
In my humble opinion, those two things are equal in difficulty and consequences.
I had almost the same number of bugs in my ML application that I would have in the imperative program, but they manifested in different ways.
I don't think FP requires the programmer to keep less things in their head, provided that the imperative program follows some good principles, that is.