Tuesday, March 21, 2017

Candlestick chart using Bokeh without date gaps

Candlestick chart is an important for stock technical analysis, and Bokeh is the best Python tool I’ve found so far to chart. Boken has a candlestick chart example http://bokeh.pydata.org/en/latest/docs/gallery/candlestick.html, there are some problems with this example though:

  1.  When there are gaps between dates (e.g. on weekends, most stock exchanges to not work), there will be gaps on the neighbouring candlesticks.    
  2. (This is a minor issue) The up and bottom tails should be of the same colour as the candle body, instead, they are of black.

For the first problem, google shows there are people who are looking at the same issue and without answers. I decided not to tackle the problem head on(because I am pragmatic), but work around it like the following (the head-on tackle would be some kind of DateGapTickFormatter):

  1.  Do not use date as the axis, rather use a sequence.
  2. Use tooltip when hover over a candlestick, the tooltip will show the detailed info of the candlestick.

Here is the relevant code:

#df contains stock information
#append a sequence to df


"date"] = pd.to_datetime(df["date"])
'date']=df['date'].apply(lambda x: x.strftime('%m/%d'))
'changepercent']=df['changepercent'].apply(lambda x: str(x)+"%")

'mid']=df.apply(lambda x:(x['open']+x['close'])/2,axis=1)
'height']=df.apply(lambda x:abs(x['close']-x['open'] if x['close']!=x['open'] else 0.001),axis=1)

inc = df.close > df.open
dec = df.open > df.close

#use ColumnDataSource to pass in data for tooltips

#the values for the tooltip come from ColumnDataSource
hover = HoverTool(
"date", "@date"),
"open", "@open"),
"close", "@close"),
"percent", "@changepercent"),

TOOLS = [CrosshairTool(), hover]
p = figure(
plot_width=700, plot_height=400, tools=TOOLS,title = df["code"][0]+" "+df["name"][0])
p.xaxis.major_label_orientation = pi/

#this is the up tail
p.segment(df.seq[inc], df.high[inc], df.seq[inc], df.low[inc], color="red")
#this is the bottom tail
p.segment(df.seq[dec], df.high[dec], df.seq[dec], df.low[dec],
#this is the candle body for the red dates
p.rect(x='seq', y='mid', width=w, height='height', fill_color="red", line_color="red", source=sourceInc)
#this is the candle body for the green dates
x='seq', y='mid', width=w, height='height', fill_color="green", line_color="green", source=sourceDec)


This is how the chart looks like:

If using date as axis, you will get this:

Notice how annoying the gaps are!

Monday, March 20, 2017

Web request performance analysis charts

This is to share some charts built on top of the http://perfspy.blogspot.com/2016/12/a-monitoring-system-for-java.html

Slowest requests 

This graph shows the processing time of slowest request in 10 seconds (10 seconds is the default frequency I gathered data). 

Details of the slowest requests

This table shows the detailed information of these requests. To get such information, there is some low-overhead code instrumentation required. 

To know whether these requests are slow because the application is busy handling too many requests, we can see the throughput chart.

Request throughput


The amount of requests the application is handling doesn’t change too much over the time, so we can be sure that the slowest request are not slow because of the application stress. This coupled with the detailed information in the table provide clue to how to improve.

Average request processing time

This is the average processing time of all requests per 10 seconds. Interesting, the spike and valleys of this graph, in many places, corresponds to the first graph (slowest requests), which means, the slowest requests has a lot of influence over the average time. 

So the question is, whether these slowest request slow down other requests?

Average request processing time without the slowest requests 


This graph shows the average request processing time without the slowest requests. There are some spikes and valleys that correspond to the previous graph, which seem to suggest that the slowest requests are slowing down other requests.

But at this point, I would caution to draw such a conclusion. There are many factors that can influence performance, analysis such as above can provide some clues, but data analysis can be like reading tea leaves, you may find data points that prove your point. The important thing is to correlate with other information and design experiments to confirm.