It's a good point that hashing is a better method when you have access to the original files.
Aren't all bets off at this point? I mean validating the backup seems skipping steps if you are not validating the source. Scrolling through thumbnails is better than nothing, sure. But it's really prone to false negatives. Corrupted images can look good in a thumbnail and your eyes might just miss even glaring corruption because you just scrolled too fast. If it's not an image file it gets more challenging.
You seem to have one of those corner cases where basically no automated method can solve your problem but the volume of data is just low enough to alleviate the issues with a bit of manual intervention.
As far as I can tell the method described in the article doesn't really validate the backups in any way, just provides some statistics that will fail in very plausible ways.
And of course, if the data is important to you and there are special circumstances that could affect the process, nothing beats an actual restore test.
You're correct that the methods I described are a far cry from actually guaranteeing that the backup has no errors. In the same way that a unit test doesn't prove code is error-free, but _can_ justify increased confidence in the code, I'm interested in techniques that can justify increased confidence in my backups. Particularly in cases where I don't have direct access to the original data, and where exhaustively checking the data manually is too time-consuming to be worth it.
Then I wrote software for backing up VM's automatically (disclaimer: this is a commercial product I sell)
There's options for getting an email on success, failure or both. The VM files are all hashed.
VMs are easy to restore, so an actual restore is pretty easy without risking to overwrite the original. If a file hash does not match on restore, then my software will complain, but continue the restore anyways.
FWIW, all my code etc... is also in source control, so I am not relying on a single layer for that.