Ok, fair enough. I would agree with the view that using md5, presumably for the faster performance, is probably not the best trade-off to be making here. Unless we're dealing with an NVMe drive (or something more exotic), you're likely to be IO bound even if using more computationally intensive hashing functions.
And if you are deduping on really fast storage, you'd get way better performance (with comparable safety) using something like xxHash64 (https://cyan4973.github.io/xxHash/).