The code changes that changed the UI - even changing some CSS - would cause a screenshot comparison failure on certain steps in the test. If it is what was expected then we overwrote the old screenshot with a new one.
It isnt exactly the same as the TDD process coz sometimes you write the code first (e.g. CSS), eyeball it and then rebuild the screenshot if it looks correct.
I'd say it's close enough though.
I wont pretend it worked perfectly. The screenshot comparison algorithms were sometimes flaky and UIs can be surprisingly non-determinstic and you need to have a process to pull the database and update screenshots accordingly. However, it's the approach I'd prefer to take in the future (I havent done UIs for about 3 years now).
I also wasnt religious about covering every single scenario with tests but I could have been. The company moved fast and sometimes they just wanted quick and not reliable.