How much change is a lot? 1% of the table + 50 rows (for small tables)? I would argue that sometimes is better to use a fixed threshold, e.g. c = 1000, m=0
All these approaches are hit or miss and are different per configuration. What I found useful is to choose the best parameters you can think of without forcing autovacuum to run everytime, and have an external job run vacuum manually to cleanup whatever got missed... eventually you can figure out the right configuration.
The truth that on most systems you have four types of tables - (large,small) x (frequently modified, static). And usually there are only a few large, frequently modified tables, while the rest is either small or static (or both). At least that's what I see on the multi-TB databases we manage.
There are two approaches:
1) tune the defaults to be aggressive enough even for the large+updated tables (which works because the small small / infrequently modified don't really require that much maintenance, compared to the first category)
2) keep defaults that are fine for majority of tables, and then use ALTER TABLE to override the parameters for the few tables that require more aggressive maintenance
Which is exactly what the blog post is about.
If you have better idea, I'd like to hear it (and I mean that seriously).
Also a note regarding the delay - the resolution really depends on hardware. Some timers have only 10ms resolution, for example.
Are you sure that the limit is shared by all workers?
> When the accumulated cost reaches a limit (specified by vacuum_cost_limit), the process performing the operation will sleep for a short period of time, as specified by vacuum_cost_delay. Then it will reset the counter and continue execution.
-- https://www.postgresql.org/docs/9.6/static/runtime-config-re...
Since every worker is it's own process, I assume that the limit applies to each worker, and overall vacuum throughput increases when using more workers.
That being said, your observation about the autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay being potentially too low may still apply.
Your throughput estimate for 8 MB/s (I think per worker) only applies for the vacuum_cost_page_hit (default 1) case. For vacuum_cost_page_miss (default 10), the throughput would only be 0.8 MB/s.
If my understanding of the way a page miss is defined is correct, this will be the common case unless one has increased shared_buffers from it's default of 128 MB to something much larger. That's b/c pg will assume a "page miss" if the page is not in its shared buffer, even if it's in the host OS page cache.
I might be wrong about the caches, so I'd love for somebody with more insights to confirm/reject these assumptions.
Edit: Seems like your 8 MB/s estimate is based on the vacuum_cost_page_miss case. But still, that's pretty low :)
=========================================================== autovacuum_vacuum_cost_limit (integer)
Specifies the cost limit value that will be used in automatic VACUUM operations. If -1 is specified (which is the default), the regular vacuum_cost_limit value will be used. Note that the value is distributed proportionally among the running autovacuum workers, if there is more than one, so that the sum of the limits for each worker does not exceed the value of this variable. This parameter can only be set in the postgresql.conf file or on the server command line; but the setting can be overridden for individual tables by changing table storage parameters.
===========================================================So, that's pretty clear, I think. In practice the balancing happens in autovac_balance_cost() function. The autovacuum workers communicate through a chunk of shared memory, and cost rebalancing is one of the things doing that.
Regarding the limits - yes, the 8MB/s is based on vacuum_cost_page_miss=10, which means a read from the OS. Per second, there's the worker process wakes up every 20ms, so 50x per second. As each round has 200 tokens, this means 10.000 tokens per second. Assuming all of them are reads from disk/OS, we can do 1000 of them (because the cost is 10), Which is 8kB x 1000 = 8MB/s reads. OTOH writes are about twice as expensive, leaving us only 500 writes, i.e. 4MB/s.
How does one even begin to think that blocking users from scrolling horizontally on your site is a good idea?
It's also using some kind of javascript callbacks to get responsive placement of the header, rather than media queries.
the solution for me was to batch inserts and updates in one transaction.