Wednesday, January 11, 2017

InfluxDB cookbook

How to connect to influxdb

vagrant@exp:~$ influx
Visit to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.1.0
InfluxDB shell version: 1.1.0

How to check DB objects?

> show databases
name: databases

> use test
Using database influxvm

> show measurements

How to check tags?

A Tag in InfluxDB is like an indexed field in a relational table. You typically make a field as a tag because you need to use it as a condition in the where clause to improve query performance.

If the name of a measurement contains dot, you need to quote the name. There seems to be a lot of issues with dot in measurement or field names, so avoid having dot in the names.

> show tag keys from "user.logins"
name: user.logins

> show tag values from "user.logins" with key=region
name: user.logins
key     value
---     -----
region    us-west

How to use tag in the where clause?

Influxdb support regex matching.

> select * from user_logins where region =~ /us.*/
name: user.logins
time                    region  service value
----                    ------  ------- -----
1478177907371000064     us-west payroll 1

Regex matching can also be used in other places:
name: measurements

How to define a retention policy?

The default retention policy is “autogent”, which is to keep data forever. The syntax for creating a retention policy is:
CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT]

To keep data for 5 days and purge it automatically after 5 days, use:

To insert data using the default retention policy:
> insert user_logins,service=payroll,region=use-west value=1

To insert data using a specific retention policy:
>insert into fivedays user_logins,service=payroll,region=use-east value=5

To select the data point just inserted, you need to prefix the measurement name with the retention policy name:
> select * from user_logins
name: user_logins
time                    region          service value
----                    ------          ------- -----
1484127510987890238     use-west        payroll 1

> select * from fivedays.user_logins
name: user_logins
time                    region          service value
----                    ------          ------- -----
1484127751197523399     use-east        payroll 5

To make a retention policy default, create it with “default” clause:
> drop retention policy fivedays on test
(data rentented with this policy will be dropped too)


Or use alter command:
alter retention policy fivedays on test default

show shards command will show retention policies on each database:
name: test
id      database        retention_policy        shard_group     start_time              end_time                expiry_time             owners
--      --------        ----------------        -----------     ----------              --------                -----------             ------
125     test            autogen                 125             2016-09-19T00:00:00Z    2016-09-26T00:00:00Z    2016-09-26T00:00:00Z
695     test            fivedays                695             2017-01-11T00:00:00Z    2017-01-12T00:00:00Z    2017-01-17T00:00:00Z

How to insert a data point with a specific timestamp?

In the previous example, “time” field contains the value when the data point was inserted, how to insert a data point with a specific timestamp?

> insert user_logins,service=payroll,region=use-west value=1 1478177907371000064
> select * from user_logins
name: user_logins
time                    region          service value
----                    ------          ------- -----
1478177907371000064     use-west        payroll 1

Note, there is a whitespace between the last field and the time field, and the time field value should a Unix nanosecond. 

> insert user_logins,service=payroll,region=use-west value=1 2016-11-03T12:58:27.371000064Z
ERR: {"error":"unable to parse 'user_logins,service=payroll,region=use-west value=1 2016-11-03T12:58:27.371000064Z': bad timestamp"}

Any timestamp field should be in the format of Unix nanosecond:

> insert user_logins,service=payroll,region=use-west value=1,finishtime=1478177907371000064 1478177907371000064
> select * from user_logins
name: user_logins
time                            finishtime              region          service value
----                            ----------              ------          ------- -----
2016-11-03T12:58:27.371000064Z  1.478177907371e+18      use-west        payroll 1

Other timestamp fields seem to stored as integers, it is not difficult to convert them to timestamps after retrieving the data.

What is one-line format?

The example in the InfluxDB documentation makes it clear:

A one-line contains 3-parts, measurement, tags and fields. Tags are separated from the measurement with comma, fields are separated from tags with a whitespace. Tags in the tag set are separated with comma, and fields in the field set are separated with common.
insert user_logins,service=payroll,region=use-west times=9,errors=5,value=9

In this example, user_logins is the metric (a.k.a the table name), tags are service and region, Fields are times, errors and value.

Telegraf StatsD plugin supports multiple tags in a similar one-line format:

echo "user_logins,service=payroll,region=us-south:2|c" | nc -C -w 1 -u localhost 8125

However, it doesn’t support multiple fields, as stated here: Unfortunately, for unknown reasons, the feature was once included according to, then removed.

How to drop all measurements?

drop measurement #NAME# will drop one measurement, but it is not possible to drop all measurements at one time, but this command can achieve the effect:
drop series from /.*/

What is a series?

A series is a collection of data that have the same measurement, tags and retention policy.
> show series from user_logins
(the output doesn’t show the retention policy active on the series)

How to output timestamp in a human-readable way?

> precision rfc3339
> select * from user_logins
name: user_logins
time                            region          service value
----                            ------          ------- -----
2017-01-11T09:38:30.987890238Z  use-west        payroll 1

To reverse back:
> precision ns

What is a continuous query?

A continuous query creates a measurement, which InfluxDB will insert data into at the group by frequency.

>CREATE CONTINUOUS QUERY "hourly_schedule" on "test"
select count("value") into "hourly_schedule"
from "schedules"
group by time(1h)

The above statement creates a measurement "hourly_schedule", as data inserts into "schedules", InfluxDB will down-sample the data and insert it into "hourly_schedule".

This only works for new data inserted into "schedules". For existing data, you just write an ordinary select … into … statement:

select count("value") into "hourly_schedule" from "schedules" where time>0 group by time(1h)

where time>0 is useless here, but InfluxDB throws an error saying “ERR: error parsing query: aggregate functions with GROUP BY time require a WHERE time clause if you don’t provide a where time clause. 

Unlike a SQL in relation DB, InfluxDB “group by” doesn’t support having clause, but with the above created down-sampling measurements, you can achieve the same effect:

select * from  "hourly_schedule"  where count>5 limit 10

How to check the size of a DB?

> use _internal
Using database _internal
> select count(diskBytes) from "tsm1_filestore" where "database"='pcoe'
name: tsm1_filestore
time  count
----  -----
0     392666