"Parallel" is about optimizing an otherwise linear, or atomic program. Like map-reduce, which can be run on a single CPU or on a distributed cluster. When SPJ is talking about parallelism he is most likely talking about nested data parallelism.
The distinction between the two can be made without ambiguity with the terms "task parallelism" (concurrency), and "data parallelism" (parallelism).
If you read the interview he is talking about the kind of implicit concurrency FP advocates often suggest you get for free with FP.
I suppose Haskell initially wasn't a concurrent language at all. It was a purely functional language and we had the idea from the beginning that a purely functional language was a good substrate for doing concurrency on. But it turned out to be a lot harder to turn that into reality than I think we really expected because we thought "If you got e1+e2 then you can evaluate e1 at the same time as e2 and everything will be wonderful because they can't affect each other because it's pure."
They had to add things like STM and explicit concurrency management because you don't get this for free just because you're doing FP.