Sunday, November 9, 2014

ElasticSearch Aggegration



The basic structure of an Aggregation query is:

"aggs" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        },
        ["aggs" : { [<sub_aggregation>]* } ]
    }
    [,"<aggregation_name_2>" : { ... } ]*

}


<aggretion_name> is the logical name given by the querier, ElasticSearch has two types of <aggregation_type>:

  • bucketing
  •  metrics


Let us use an example to get ourselves familiar with aggregation.

Data preparation

This is some simple data for car sales:
manufacture
model
price
date
Volkswagen
Jetta
13300
1/7/2007
Ford
Focus
13600
1/7/2007
Volkswagen
Passat
22200
2/5/2007
Volkswagen
Golf
16700
2/5/2007
Ford
Focus
13500
2/7/2007

To send the data to ElasticSearch, first create a mapping:

#create a mapping named carsalemap for index named carsales
curl -XPUT "http://localhost:9200/carsales/" -d'
{
   "mappings": {     
      "carsalemap": {
         "properties": {
            "date": {
               "type": "date",
               "format": "dateOptionalTime"
            },
            "manufacture": {
               "type": "string"
            },
            "model": {
               "type": "string"
            },
            "price": {
               "type": "float"
            }
         }
      }  
   }
}'
#you can check the mapping
curl -XGET localhost:9200/carsales/_mapping?pretty


Now send the data to ElasticSearch:

curl -XPOST "http://localhost:9200/carsales/_bulk" -d'
{"index":{"_index":"carsales","_type":"carsalemap"}}
{"manufacture":"Volkswagen", "model":"Jetta", "price":"13300", "date": "2007-01-07"}
{"index":{"_index":"carsales","_type":"carsalemap"}}
{"manufacture":"Ford", "model":"Focus", "price":"13600", "date": "2007-01-07"}
{"index":{"_index":"carsales","_type":"carsalemap"}}
{"manufacture":"Volkswagen", "model":"Passat", "price":"22200", "date": "2007-02-05"}
{"index":{"_index":"carsales","_type":"carsalemap"}}
{"manufacture":"Volkswagen", "model":"Golf", "price":"16700", "date": "2007-02-05"}
{"index":{"_index":"carsales","_type":"carsalemap"}}
{"manufacture":"Ford", "model":"Focus", "price":"13500", "date": "2007-02-07"}
'

Monthly sales for each manufacture


curl -XPOST "http://localhost:9200/carsales/_search?pretty" -d'
{
    "aggs": {
        "salesmonth": {
            "date_histogram": {
                "field": "date",
                "interval": "1M"
            },
            "aggs": {
                "manufacture": {
                    "terms": {                     
                        "field": "manufacture"
                    }
                }
            }
        }
    },
    "size": 0
}'

The result is:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "salesmonth": {
            "buckets": [
                {
                    "key_as_string": "2007-01-01T00:00:00.000Z",
                    "key": 1167609600000,
                    "doc_count": 2,
                    "manufacture": {
                        "buckets": [
                            {
                                "key": "ford",
                                "doc_count": 1
                            },
                            {
                                "key": "volkswagen",
                                "doc_count": 1
                            }
                        ]
                    }
                },
                {
                    "key_as_string": "2007-02-01T00:00:00.000Z",
                    "key": 1170288000000,
                    "doc_count": 3,
                    "manufacture": {
                        "buckets": [
                            {
                                "key": "volkswagen",
                                "doc_count": 2
                            },
                            {
                                "key": "ford",
                                "doc_count": 1
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Monthly sales for manufacture who has model "Focus"


{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "should": [
                        {
                            "query_string": {
                                "query": "model:Focus"
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "salesmonth": {
            "date_histogram": {
                "field": "date",
                "interval": "1M"
            },
            "aggs": {
                "manufacture": {
                    "terms": {                      
                        "field": "manufacture"
                    }
                }
            }
        }
    },
    "size": 0
}

The result is:

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "salesmonth": {
            "buckets": [
                {
                    "key_as_string": "2007-01-01T00:00:00.000Z",
                    "key": 1167609600000,
                    "doc_count": 1,
                    "manufacture": {
                        "buckets": [
                            {
                                "key": "ford",
                                "doc_count": 1
                            }
                        ]
                    }
                },
                {
                    "key_as_string": "2007-02-01T00:00:00.000Z",
                    "key": 1170288000000,
                    "doc_count": 1,
                    "manufacture": {
                        "buckets": [
                            {
                                "key": "ford",
                                "doc_count": 1
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Monthly sales for each model of each manufacture

This involves a nested term aggregation inside a term aggregation.

{
    "aggs": {
        "salesmonth": {
            "date_histogram": {
                "field": "date",
                "interval": "1M"
            },
            "aggs": {
                "manufacture": {
                    "terms": {                      
                        "field": "manufacture"
                    },
                    "aggs": {
                        "model": {
                            "terms": {
                                "field": "model"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}

The result is:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "salesmonth": {
            "buckets": [
                {
                    "key_as_string": "2007-01-01T00:00:00.000Z",
                    "key": 1167609600000,
                    "doc_count": 2,
                    "manufacture": {
                        "buckets": [
                            {
                                "key": "ford",
                                "doc_count": 1,
                                "model": {
                                    "buckets": [
                                        {
                                            "key": "focus",
                                            "doc_count": 1
                                        }
                                    ]
                                }
                            },
                            {
                                "key": "volkswagen",
                                "doc_count": 1,
                                "model": {
                                    "buckets": [
                                        {
                                            "key": "jetta",
                                            "doc_count": 1
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                },
                {
                    "key_as_string": "2007-02-01T00:00:00.000Z",
                    "key": 1170288000000,
                    "doc_count": 3,
                    "manufacture": {
                        "buckets": [
                            {
                                "key": "volkswagen",
                                "doc_count": 2,
                                "model": {
                                    "buckets": [
                                        {
                                            "key": "golf",
                                            "doc_count": 1
                                        },
                                        {
                                            "key": "passat",
                                            "doc_count": 1
                                        }
                                    ]
                                }
                            },
                            {
                                "key": "ford",
                                "doc_count": 1,
                                "model": {
                                    "buckets": [
                                        {
                                            "key": "focus",
                                            "doc_count": 1
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Average price for all car sales


{
    "aggs": {
        "avgprice": {
            "avg": {
                "field": "price"
            }
        }
    },
    "size": 0
}

The result is:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "avgprice": {
            "value": 15860
        }
    }
}

Average sale price for each manufacture

This involves nested aggregations.

{
    "aggs": {
        "manufacture": {
            "terms": {               
                "field": "manufacture"
            },
            "aggs": {
                "avgprice": {
                    "avg": {
                        "field": "price"
                    }
                }
            }
        }
    },
    "size": 0
}

The result is:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "manufacture": {
            "buckets": [
                {
                    "key": "volkswagen",
                    "doc_count": 3,
                    "avgprice": {
                        "value": 17400
                    }
                },
                {
                    "key": "ford",
                    "doc_count": 2,
                    "avgprice": {
                        "value": 13550
                    }
                }
            ]
        }
    }
}

Average sale price for each model of each manufacture

This involves nested aggregations.

{
    "aggs": {
        "manufacture": {
            "terms": {
                "field": "manufacture"
            },
            "aggs": {
                "model": {
                    "terms": {
                        "field": "model"
                    },
                    "aggs": {
                        "avgprice": {
                            "avg": {
                                "field": "price"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}

The result is:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "manufacture": {
            "buckets": [
                {
                    "key": "volkswagen",
                    "doc_count": 3,
                    "model": {
                        "buckets": [
                            {
                                "key": "golf",
                                "doc_count": 1,
                                "avgprice": {
                                    "value": 16700
                                }
                            },
                            {
                                "key": "jetta",
                                "doc_count": 1,
                                "avgprice": {
                                    "value": 13300
                                }
                            },
                            {
                                "key": "passat",
                                "doc_count": 1,
                                "avgprice": {
                                    "value": 22200
                                }
                            }
                        ]
                    }
                },
                {
                    "key": "ford",
                    "doc_count": 2,
                    "model": {
                        "buckets": [
                            {
                                "key": "focus",
                                "doc_count": 2,
                                "avgprice": {
                                    "value": 13550
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Find sales price greater than 15000, and calculate the average price for each manufacture

There are two ways to do this, one way is to use a filter bucket:


{
    "aggs": {
        "expensive": {
            "filter": {
                "range": {
                    "price": {
                        "from": "15000"
                    }
                }
            },
            "aggs": {
                "manufacture": {
                    "terms": {
                        "field": "manufacture"
                    },
                    "aggs": {
                        "avgprice": {
                            "avg": {
                                "field": "price"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}


The result is:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "expensive": {
            "doc_count": 2,
            "manufacture": {
                "buckets": [
                    {
                        "key": "volkswagen",
                        "doc_count": 2,
                        "avgprice": {
                            "value": 19450
                        }
                    }
                ]
            }
        }
    }
}

Notice the hits total is 5, meaning ElasticSearch hits upon all 5 documents, and then filter them and aggregate them. Removing   "size": 0 from the query, you will see all 5 documents are returned.

Another way is to first filter out unqualified documents and then aggregate them:

{
    "query": {
        "filtered": {
            "filter": {
                "range": {
                    "price": {
                        "gte": 15000
                    }
                }
            }
        }
    },
    "aggs": {
        "manufacture": {
            "terms": {
                "field": "manufacture"
            },
            "aggs": {
                "avgprice": {
                    "avg": {
                        "field": "price"
                    }
                }
            }
        }
    },
    "size": 0
}

The result is:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "manufacture": {
            "buckets": [
                {
                    "key": "volkswagen",
                    "doc_count": 2,
                    "avgprice": {
                        "value": 19450
                    }
                }
            ]
        }
    }
}

With this, the hits total is 2.

post_filter

post_filter narrows down the query scope, but doesn’t affect aggregation:

{
    "post_filter": {
        "term": {
            "model": "focus"
        }
    },
    "aggs": {
        "manufacture": {
            "terms": {
                "field": "manufacture"
            },
            "aggs": {
                "avgprice": {
                    "avg": {
                        "field": "price"
                    }
                }
            }
        }
    }
}

The result is:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
            {
                "_index": "carsales",
                "_type": "carsalemap",
                "_id": "Q5SzIJTFQAen71pSr12K3Q",
                "_score": 1,
                "_source": {
                    "manufacture": "Ford",
                    "model": "Focus",
                    "price": "13600",
                    "date": "2007-01-07"
                }
            },
            {
                "_index": "carsales",
                "_type": "carsalemap",
                "_id": "5QkFSRPsR4-GO4L2l5hGiA",
                "_score": 1,
                "_source": {
                    "manufacture": "Ford",
                    "model": "Focus",
                    "price": "13500",
                    "date": "2007-02-07"
                }
            }
        ]
    },
    "aggregations": {
        "manufacture": {
            "buckets": [
                {
                    "key": "volkswagen",
                    "doc_count": 3,
                    "avgprice": {
                        "value": 17400
                    }
                },
                {
                    "key": "ford",
                    "doc_count": 2,
                    "avgprice": {
                        "value": 13550
                    }
                }
            ]
        }
    }
}