Pro is approximately in the middle between GPT 3.5 and GPT 4 on four measures (MMLU, BIG-Bench-Hard, Natural2Cod, DROP), it is closer to 3.5 on two (MATH, Hellaswag), and closer to four on the remaining two (GSM8K, HumanEval). Two one way, two the other way, and four in the middle.
So it's a split almost right down the middle, if anything closer to 4, at least if you assume the benchmarks to be of equal significance.