https://www.gwern.net/Archiving%20URLs was the initial idea and base for my code. I see now he's updated his page considerably - IIRC - from what was there before.
My normal medium is clay so I'd rely on his stuff rather than mine (linked below).
* https://gist.github.com/pbhj/6636d0908d0d11885809a2545b13869... main script
* https://gist.github.com/pbhj/4dedca1e980d6a102433403c0f43552... filter script