This article explains it very well, http://ihorbobak.com/index.php/2015/08/05/cluster-profiling/, so I am just going to copy it directly:
If we put all stacktraces vertically, we can visualize it like this:
The time goes from left to right column, and every stack trace is shown from bottom to top. Therefore, in the method D we spent 20 milliseconds, C was executing just 10 milliseconds on its own, and method B was executing 10 milliseconds. If we group the cells so that the same parent method call are joined, we get this:
· Method D was executing 50% of the whole CPU time;
· Method C was executing 75% of CPU time, among which 25% of time it was executing its own code, and the rest 50% it spent in the call of D;
· Method B was executing 100% of CPU time (same as A), but just 25% of time method B was executing itself, the rest of time it was calling other methods.
There is just one minor correction: Stacktrace gathered contains the line number of the method, so in the table, you actually see A:5, B:7 etc. The same method with different line numbers are counted separately.
The top bars are what are currently occupying CPU, but you should, generally speaking, look out for the horizontal bars underneath that are longest, these bars (methods) are invoked most often during a time period.
I find combining CPU flame graph with the memory graph is useful. For example, the following memory graph shows heap is consumed rapidly and periodically (to be more precise, 1GB memory is consumed in 10 seconds). By checking the CPU flame graph, I can get a quick idea what the system was doing during the time.