Here's a comparison between first second of video, last second (54'') of video, and a composite. Levels are adjusted for clarity because the original comparison is amateurish and more than a little disingenuous.
https://i.imgur.com/TVKFW4S.png
(1) It's pretty obvious there are 3 hands which everyone seems to be conveniently ignoring (you know, like a watch).
(2) Composite frame clearly show hands do not move at all throughout the video suggest the watch is in fact not wound / functional.