Tuesday, April 25, 2017

How to read a CPU Flame Graph


This article explains it very well, http://ihorbobak.com/index.php/2015/08/05/cluster-profiling/, so I am just going to copy it directly:

Imagine you have some java class with a method A that called method B, B called C, C called D. We’ll denote a call by an arrow, so A->B->C->D. Every 10 milliseconds a special javaagent (embedded into JVM process) gets all stacktraces of all the threads of a process. For example, it may “see” that the main thread of our app is calling A->B->C (method C is executing now), at another moment of time it observes that A->B->C->D is the current stacktrace (method D is executing now), at the next moment of time it sees that it is in A->B->C->D again, and at the next moment we are in A->B (so that the code of B is executing).

If we put all stacktraces vertically, we can visualize it like this:



The time goes from left to right column, and every stack trace is shown from bottom to top. Therefore, in the method D we spent 20 milliseconds, C was executing just 10 milliseconds on its own, and method B was executing 10 milliseconds. If we group the cells so that the same parent method call are joined, we get this:




To summarize, the flame graph is an approximate visualization of what is going on: which class/method is executing how much % of CPU time. From the diagram above we can conclude that:
·        Method D was executing 50% of the whole CPU time;
·        Method C was executing 75% of CPU time, among which 25% of time it was executing its own code, and the rest 50% it spent in the call of D;
·        Method B was executing 100% of CPU time (same as A), but just 25% of time method B was executing itself, the rest of time it was calling other methods.

Sampling has some sort of a bias: in the case if D called E and E finished execution before we captured stacktraces, we won’t see E’s execution. However, sampling doesn’t add much overhead: there are no additional java bytecode instructions added to all the rest threads, but instead there exists just one additional thread that gathers stacktraces.

There is just one minor correction: Stacktrace gathered contains the line number of the method, so in the table, you actually see A:5, B:7 etc. The same method with different line numbers are counted separately. 



The top bars are what are currently occupying CPU, but you should, generally speaking, look out for the horizontal bars underneath that are longest, these bars (methods) are invoked most often during a time period. 

I find combining CPU flame graph with the memory graph is useful. For example, the following memory graph shows heap is consumed rapidly and periodically (to be more precise, 1GB memory is consumed in 10 seconds). By checking the CPU flame graph,  I can get a quick idea what the system was doing during the time.