The latter, yes. Interestingly I’m not surprised at all. This is what many researchers themselves do lol. I never take a reference at face value from any human being and I apply the same standard to gpt-4 as well. But all its references are real. Just 20-40% of time it might not exactly say the same as what I asked it for (though it’s related, and mostly there).
I've been trying to get GPT-4 to give me accurate links to predictable websites. It gives me very plausible links, that even have the right domain and path format but often the plausible link is not the correct link and GPT-4 seems to have no awareness of the correct link.