> one hospital was for higher risk patients- so it learned to assign all patients from that hospital to the disease category simply because they were at that hospital.
That just sounds like poor feature selection/engineering. Garbage in, garbage out.
Yeah there are definitely ways they would have avoided that, but it's just one example of many. The whole point of ML is that it picks up on learned patterns. The problem is that it can be difficult to identify what it is learning from- this paper itself says they do not know what is causing it to make these predictions. As a result it is difficult to validate that the model is doing what people think it is.