The basic structure of an Aggregation query is:
"aggs" : {"<aggregation_name>" : {"<aggregation_type>" : {<aggregation_body>},["aggs" : { [<sub_aggregation>]* } ]}[,"<aggregation_name_2>" : { ... } ]*}
<aggretion_name>
is the logical name given by the querier, ElasticSearch has two types of <aggregation_type>:
- bucketing
- metrics
You can find the list of <aggregation_type>here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html.
Let us use an example to get ourselves familiar with
aggregation.
Data preparation
This is some simple data for car sales:
manufacture
|
model
|
price
|
date
|
Volkswagen
|
Jetta
|
13300
|
1/7/2007
|
Ford
|
Focus
|
13600
|
1/7/2007
|
Volkswagen
|
Passat
|
22200
|
2/5/2007
|
Volkswagen
|
Golf
|
16700
|
2/5/2007
|
Ford
|
Focus
|
13500
|
2/7/2007
|
To send the data to ElasticSearch, first create a mapping:
#create a mapping named carsalemap for index named carsalescurl -XPUT "http://localhost:9200/carsales/" -d'{"mappings": {"carsalemap": {"properties": {"date": {"type": "date","format": "dateOptionalTime"},"manufacture": {"type": "string"},"model": {"type": "string"},"price": {"type": "float"}}}}}'#you can check the mappingcurl -XGET localhost:9200/carsales/_mapping?pretty
Now send the data to ElasticSearch:
curl -XPOST "http://localhost:9200/carsales/_bulk" -d'{"index":{"_index":"carsales","_type":"carsalemap"}}{"manufacture":"Volkswagen", "model":"Jetta", "price":"13300", "date": "2007-01-07"}{"index":{"_index":"carsales","_type":"carsalemap"}}{"manufacture":"Ford", "model":"Focus", "price":"13600", "date": "2007-01-07"}{"index":{"_index":"carsales","_type":"carsalemap"}}{"manufacture":"Volkswagen", "model":"Passat", "price":"22200", "date": "2007-02-05"}{"index":{"_index":"carsales","_type":"carsalemap"}}{"manufacture":"Volkswagen", "model":"Golf", "price":"16700", "date": "2007-02-05"}{"index":{"_index":"carsales","_type":"carsalemap"}}{"manufacture":"Ford", "model":"Focus", "price":"13500", "date": "2007-02-07"}'
Monthly sales for each manufacture
curl -XPOST "http://localhost:9200/carsales/_search?pretty" -d'{"aggs": {"salesmonth": {"date_histogram": {"field": "date","interval": "1M"},"aggs": {"manufacture": {"terms": {"field": "manufacture"}}}}},"size": 0}'
The result is:
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 5,"max_score": 0,"hits": []},"aggregations": {"salesmonth": {"buckets": [{"key_as_string": "2007-01-01T00:00:00.000Z","key": 1167609600000,"doc_count": 2,"manufacture": {"buckets": [{"key": "ford","doc_count": 1},{"key": "volkswagen","doc_count": 1}]}},{"key_as_string": "2007-02-01T00:00:00.000Z","key": 1170288000000,"doc_count": 3,"manufacture": {"buckets": [{"key": "volkswagen","doc_count": 2},{"key": "ford","doc_count": 1}]}}]}}}
Monthly sales for manufacture who has model "Focus"
{"query": {"filtered": {"query": {"bool": {"should": [{"query_string": {"query": "model:Focus"}}]}}}},"aggs": {"salesmonth": {"date_histogram": {"field": "date","interval": "1M"},"aggs": {"manufacture": {"terms": {"field": "manufacture"}}}}},"size": 0}
The result is:
{"took": 6,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 2,"max_score": 0,"hits": []},"aggregations": {"salesmonth": {"buckets": [{"key_as_string": "2007-01-01T00:00:00.000Z","key": 1167609600000,"doc_count": 1,"manufacture": {"buckets": [{"key": "ford","doc_count": 1}]}},{"key_as_string": "2007-02-01T00:00:00.000Z","key": 1170288000000,"doc_count": 1,"manufacture": {"buckets": [{"key": "ford","doc_count": 1}]}}]}}}
Monthly sales for each model of each manufacture
This involves
a nested term
aggregation inside a term aggregation.
{"aggs": {"salesmonth": {"date_histogram": {"field": "date","interval": "1M"},"aggs": {"manufacture": {"terms": {"field": "manufacture"},"aggs": {"model": {"terms": {"field": "model"}}}}}}},"size": 0}
The result
is:
{"took": 2,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 5,"max_score": 0,"hits": []},"aggregations": {"salesmonth": {"buckets": [{"key_as_string": "2007-01-01T00:00:00.000Z","key": 1167609600000,"doc_count": 2,"manufacture": {"buckets": [{"key": "ford","doc_count": 1,"model": {"buckets": [{"key": "focus","doc_count": 1}]}},{"key": "volkswagen","doc_count": 1,"model": {"buckets": [{"key": "jetta","doc_count": 1}]}}]}},{"key_as_string": "2007-02-01T00:00:00.000Z","key": 1170288000000,"doc_count": 3,"manufacture": {"buckets": [{"key": "volkswagen","doc_count": 2,"model": {"buckets": [{"key": "golf","doc_count": 1},{"key": "passat","doc_count": 1}]}},{"key": "ford","doc_count": 1,"model": {"buckets": [{"key": "focus","doc_count": 1}]}}]}}]}}}
Average price for all car sales
{"aggs": {"avgprice": {"avg": {"field": "price"}}},"size": 0}
The result is:
{"took": 1,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 5,"max_score": 0,"hits": []},"aggregations": {"avgprice": {"value": 15860}}}
Average sale price for each manufacture
This
involves nested aggregations.
{"aggs": {"manufacture": {"terms": {"field": "manufacture"},"aggs": {"avgprice": {"avg": {"field": "price"}}}}},"size": 0}
The result is:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 5,"max_score": 0,"hits": []},"aggregations": {"manufacture": {"buckets": [{"key": "volkswagen","doc_count": 3,"avgprice": {"value": 17400}},{"key": "ford","doc_count": 2,"avgprice": {"value": 13550}}]}}}
Average sale price for each model of each manufacture
This
involves nested aggregations.
{"aggs": {"manufacture": {"terms": {"field": "manufacture"},"aggs": {"model": {"terms": {"field": "model"},"aggs": {"avgprice": {"avg": {"field": "price"}}}}}}},"size": 0}
The result is:
{"took": 2,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 5,"max_score": 0,"hits": []},"aggregations": {"manufacture": {"buckets": [{"key": "volkswagen","doc_count": 3,"model": {"buckets": [{"key": "golf","doc_count": 1,"avgprice": {"value": 16700}},{"key": "jetta","doc_count": 1,"avgprice": {"value": 13300}},{"key": "passat","doc_count": 1,"avgprice": {"value": 22200}}]}},{"key": "ford","doc_count": 2,"model": {"buckets": [{"key": "focus","doc_count": 2,"avgprice": {"value": 13550}}]}}]}}}
Find sales price greater than 15000, and calculate the average price for each manufacture
There are
two ways to do this, one way is to use a filter bucket:
{"aggs": {"expensive": {"filter": {"range": {"price": {"from": "15000"}}},"aggs": {"manufacture": {"terms": {"field": "manufacture"},"aggs": {"avgprice": {"avg": {"field": "price"}}}}}}},"size": 0}
The result
is:
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 5,"max_score": 0,"hits": []},"aggregations": {"expensive": {"doc_count": 2,"manufacture": {"buckets": [{"key": "volkswagen","doc_count": 2,"avgprice": {"value": 19450}}]}}}}
Notice the hits total is 5, meaning ElasticSearch hits upon all 5
documents, and then filter them and aggregate them. Removing "size": 0 from the query,
you will see all 5 documents are returned.
Another way
is to first filter out unqualified documents and then aggregate them:
{"query": {"filtered": {"filter": {"range": {"price": {"gte": 15000}}}}},"aggs": {"manufacture": {"terms": {"field": "manufacture"},"aggs": {"avgprice": {"avg": {"field": "price"}}}}},"size": 0}
The result
is:
{"took": 2,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 2,"max_score": 0,"hits": []},"aggregations": {"manufacture": {"buckets": [{"key": "volkswagen","doc_count": 2,"avgprice": {"value": 19450}}]}}}
With this,
the hits total is 2.
post_filter
post_filter narrows down the query scope, but doesn’t affect
aggregation:
{"post_filter": {"term": {"model": "focus"}},"aggs": {"manufacture": {"terms": {"field": "manufacture"},"aggs": {"avgprice": {"avg": {"field": "price"}}}}}}
The result is:
{"took": 2,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 2,"max_score": 1,"hits": [{"_index": "carsales","_type": "carsalemap","_id": "Q5SzIJTFQAen71pSr12K3Q","_score": 1,"_source": {"manufacture": "Ford","model": "Focus","price": "13600","date": "2007-01-07"}},{"_index": "carsales","_type": "carsalemap","_id": "5QkFSRPsR4-GO4L2l5hGiA","_score": 1,"_source": {"manufacture": "Ford","model": "Focus","price": "13500","date": "2007-02-07"}}]},"aggregations": {"manufacture": {"buckets": [{"key": "volkswagen","doc_count": 3,"avgprice": {"value": 17400}},{"key": "ford","doc_count": 2,"avgprice": {"value": 13550}}]}}}
No comments:
Post a Comment