Blog is packed with information, thanks!
Isn't it the case that from stack traces it is rather impossible to read that function foo() is burning CPU cycles because it is memory-bound? And the reason could be rather somewhere else and not in that particular function - e.g. multiple other threads creating contention on the memory bus?
If so, doesn't this make the profile somewhat an invalid candidate for PGO?