Frequently a query plan turns out to perform nowhere near as well as the planner expects (for example, because the data distribution is poor, or a key being filtered for doesn't exist). In those cases, flipping the query plan around could turn a 1 hour query into a 1 millisecond query. Yet postgres doesn't have the ability to do that.
Sure a human can sometimes use domain knowledge to sometimes be able to rearrange the query to force a plan that works well, but in the general case, the database shouldn't have performance drop by a factor of 1 million because of semi-arbitrary planner decisions. Being able to try multiple possible plans would be a massive start in solving the issue.
Postgres queries are streamed to the client - ie. some results are delivered before the query is done running. That functionality is necessary for really big resultsets.
That makes it difficult to change plans mid query, because your new plan might return results in a different order, and you need to filter any already-returned results, but you can't afford to keep all of the already-returned results in RAM. Even if that weren't an issue, I'm not even sure that it's always valid to do this.
To add even more complexity... The postgres protocol allows the client to reverse the query (ie. midway through getting the results, the client can say "yo, go back, and return results from earlier again"). It must return the same results in reverse order. That means if you do switch query plans, when the client goes backwards, the server needs to un-switch query plans back to the old plan when going backwards.
These issues are not insurmountable... But they certainly stopped me implementing it in the day I had set aside for the task...
That being said, this tool won't work well for anything but trivial queries. PostgreSQL query plans have many quirks around CTEs, Loops, etc. that cause problems when trying to determine the true inclusive/exclusive time for each node without forgetting stuff or counting it twice. The only tool that tackles them fairly in my experiences it the good old https://explain.depesz.com/ using it's own Pg--Explain library [1].
I'm currently working on my own version of a tool like the one presented by OP (called FlameExplain), and hope to release it soon.
[1] https://gitlab.com/depesz/Pg--Explain/blob/master/lib/Pg/Exp...
I’ll add a CTE demo with and explanation.
The psql client binary should include visualizations like this even in text mode, so admins don't have to follow a multitude of steps as described in the repo:
1) run query, store results in .json file
2) scp .json file to your dev machine
3) run visualization tool
If I already have a psql client shell open, why can't this all be done in the background? I think there is still a lot of potential for improvement.
But in general I do agree that simplifying this type of tooling is a good thing and something to strive for.
What I'm seeing is we've got two big platforms (web, CLI) and they both have some distinct advantages and some distinct disadvantages, and it's not easy for either one to cross the chasm and compete with the other one directly. We desperately need a new platform which combines what we like about both of these, and discards what we don't.
Until then, we're just going to keep implementing all end-user functionality twice, because by historical accident developers do most of their work in a DEC VT100 emulator, and end-users won't tolerate that.
But IMHO the overall friction is still too large for such a central use case as query optimization done by a database admin.
The PostgreSQL team has been innovating and improving steadily, so I am confident these kind of workflows will be heavily optimized within the next couple of years.
The information density the rendering gives me is way better than the flamegraph.
I find the text output much more useful and detailed then any graphical display. Especially when generated with "explain (analyze, buffers, timing) ..." and "track_io_timing" turned on.
tldr; use both as needed
FYI, I once published a similar tool for Oracle, including an explanation of how to read FlameGraphs in SQL execution context.
https://blog.tanelpoder.com/posts/visualizing-sql-plan-execu...
edit: Googled it, looks like PostgreSQL maintainers just have an opinion that hints are bad and the planner is good enough:
https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion
There are "Explicit JOINs" but I'm not sure how useful they are:
On the application I'm currently working on the difference between the two indexes is night and day, the gin index will respond in 100ms, whilst the btree index can be 15secs+. We've resorted to having two columns with the same content, one with a btree index and one with a gin index so that we can explicitly choose which index to hit.
It's unfortunate that it comes off as if maintainers think planner is good enough, because that is demonstrably not true. And from my discussions there is wide agreement that the planner will never be perfect. But agreeing that it's something that needs work is not enough, someone actually has to do the work too. Reality is that implementing any feature to PostgreSQL quality standards is hard work.