Both, and it agrees with your prediction. KNL's cores are each much faster than KNC's cores. A KNC core was over 10X slower than a mainstream CPU core, and a KNL core seems to only be about 4.5X slower (on my particular code). I also get linear OpenMP scaling from 1 to 64 threads on KNL, so the parallelism is all there.
Some questions out of curiosity:
Is your application bandwidth-bound / compute-bound or something else? Also what modes have you been operating the KNL chip in?