VACUUM by default runs with the TRUNCATE option. This will, if possible and valuable, truncate the heap and return those pages to the operating system. This operation takes an AccessExclusive lock, which blocks reads and writes. It is written to be very deferential and only runs if nothing else is trying to take a competing lock.
AccessExclusive is a unique lock in that it is replicated to standbys. When the writer holds the lock, reads on the standbys will be blocked. However, the writer doesn't know about them, so its conservative approach of only acquiring the lock if there's no contention is completely thwarted.
Finally, if vacuum truncate is interrupted by lock contention it must repeat a scan of the heap to ensure the blocks are still empty before proceeding with the truncation.
All of these details combined meant our writer got stuck in a "grab lock, scan, be interrupted, release lock, grab lock, restart scan, be interrupted, release lock... etc" loop. The replication of this lock to the reader seemed to have batched together, so the readers never got to run in between the loop iterations. The vacuum never actually succeeded because it kept re-scanning. We had to intervene manually to disable autovacuum and cancel the in-progress jobs.
We've hit plenty of weird postgres issues, but this was a new one for us.
Edit: of note is the somewhat rare lack of clarity in the postgres docs. "VACUUM (without FULL)" is documented as only requiring ShareUpdateExclusive, which is why autovacuum is considered safe to run. Turns out that's not true.
How is contention on the standbys communicated back to the writer? I'm confused about how the writer is interrupted mid-cycle if only the readers are experiencing lock contention.
Contention from the standbys do not get communicated to the writer.
Not sure "SERIAL is non-transactional" belongs at the bottom, I thought that was common knowledge... but that might be from years doing Oracle where sequences are more explicit.
> This could even cause an infinite loop in some cases where updates continually place the updated record ahead of the scan performing the update operation.
This just is not true for Postgresql and they don't specify anywhere that they switched from talking about Postgresql to a different database.
So either they believe it's an concern with Postgresql or they aren't being clear when they are talking about different databases.