Explain (analyze at least - which you should always use) is a lot less theoretical than you might assume. That can make it a bit more onerous to execute but it ends up adding a lot of value to the statistics when you gather them.
Oh also - query caching on postgres is a thing so if you're worried about performance from a cold state don't forget to clear caches before executing. And if anyone has any good suggestions around tools to screw up table statistics I haven't found a good one that I like yet.
Exactly. That's why my team and I (Postgres.ai) have developed Database Lab Engine [1] and a chatops tool for SQL optimization, Joe bot [2], both are open-source (AGPLv3).
EXPLAIN (ANALYZE, BUFFERS) has to be executed on the same-size DB, with properly adjusted Postgres configuration.
Interesting, that the machine you might using for query plan troubleshooting, can have less RAM and different hardware in general – it doesn't matter for the planner. Even shared_buffers doesn't matter – you can set effective_cache_size matching production (this trick we use in Database Lab when hardware is weaker than on production).
As for the cache states – very good point as well. I'm advocating for buffers- or rows-centric approach: first, optimization should be done to reduce the numbers of buffers or, if you're working with "logical" (dump/restore) copy of the database rather than "physical" (PGDATA copy, keeping the same data layout, including bloat, etc.) – the fewer the numbers, the better. Only then, you pay attention to timing – and keep in mind what can happen under the worst conditions (everything is read from disk), if it makes sense.
Perhaps you already know these, but just in case:
- https://github.com/ossc-db: pg_dbms_stats, pg_store_plans, pg_hint_plan
If you're a small enough shop to consider it I highly recommend setting up something to automatically explain queries meeting some criteria on production or using some analysis stack (like new relic) to just capture all the query executions within certain time windows.
These tools all come with costs and should never just run continuously on production if you're getting no benefit from them, but the value can be quite significant.
But auto_explain solves only part of the task – you can see what happened, but cannot see the answers to "what if" questions. ("What if I used this index?")
[1] https://www.pgmustard.com/blog/auto-explain-overhead-with-ti...
Exact same configuration is not enough. You want shared buffers and disk cache to look the same as it looks on production and you also want the same common queries running in the background.
I mean, "need" in case of a busy database and being at a high optimization level where small details matter. You can catch more obvious stuff with much less care.
Both have pros and cons about how they visualize things.
> Find the lowest node where the estimated row count is significantly different from the actual row count.
> ...
> Under the heading “rows x” you see by what factor PostgreSQL overestimated or underestimated the row count. Bad estimates are highlighted with a red background.
Am I missing something? Everything actually shown displays identical row count estimates/actual, and red/yellow/orange associated with accurate estimates. What am I not seeing??
For an intro to this that goes through several examples, I highly recommend a conference talk[1] by Josh Berkus in 2015/16 that he gave a few times. It has aged pretty well and I’ve not yet seen the basics covered better.
If I have time this weekend I’ll check out the video.
The table I was deleting from had 20 foreign key constraints referencing it, and a couple of them didn't have an index on the referencing column and were big (a few million rows). Added indexes to all of them, took a couple of minutes to build, and the DELETE ran in a few seconds.
Sometimes the answer to a performance issue can't be found in EXPLAIN. And always remember to properly index your foreign key constraints.