Of course I expect to see the call tree, and to flip between that and a flat view, to flip between different metrics in the views, and to see them across processes/threads. I'm talking about graphical tools (perhaps coupled with external reduction of the data) like CUBE [1], Paraprof [2], and those in toolsets like Open|Speedshop [3] and HPCToolkit [4] with which I'm less familiar.
1. http://www.scalasca.org/software/cube-4.x/documentation.html
2. https://www.cs.uoregon.edu/research/tau/docs/newguide/bk01pt...
3. https://openspeedshop.org
4. https://hpctoolkit.org