This was my guess as well. I've spent a lot of time around radiology and AI (I used to work at a company specializing in it) and we read a lot of the failure cases as well. There was one example where the model picked up on the hospital, and one hospital was for higher risk patients- so it learned to assign all patients from that hospital to the disease category simply because they were at that hospital.
There are a ton of cases like this out there, especially when using public datasets (which in the medical field tend to be very unbalanced datasets due to the difficulties of building a HIPAA compliant public dataset).