Your deep multimodal models or the MRI imaging?
What you are essentially saying is the signal is so subtle that only a large NN can reliably extract it.
While that may well be the case, it would be better to have a scan/diagnostic that doesn't need that level of signal processing to interpret.
For example - you don't need a large generative deep multimodal model to read a Covid antigen or PCR test.