Wednesday, January 11, 2017

InfluxDB cookbook



How to connect to influxdb

vagrant@exp:~$ influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.1.0
InfluxDB shell version: 1.1.0

How to check DB objects?

> show databases
name: databases
name
----
_internal
pcoe
test

> use test
Using database influxvm

> show measurements
name
----
cpu
mem
user.logins

How to check tags?

A Tag in InfluxDB is like an indexed field in a relational table. You typically make a field as a tag because you need to use it as a condition in the where clause to improve query performance.

If the name of a measurement contains dot, you need to quote the name. There seems to be a lot of issues with dot in measurement or field names, so avoid having dot in the names.

> show tag keys from "user.logins"
name: user.logins
tagKey
------
region
service

> show tag values from "user.logins" with key=region
name: user.logins
key     value
---     -----
region    us-west

How to use tag in the where clause?

Influxdb support regex matching.

> select * from user_logins where region =~ /us.*/
name: user.logins
time                    region  service value
----                    ------  ------- -----
1478177907371000064     us-west payroll 1

Regex matching can also be used in other places:
> SHOW MEASUREMENTS WITH MEASUREMENT =~ /.*login.*/
name: measurements
name
----
user_logins

How to define a retention policy?

The default retention policy is “autogent”, which is to keep data forever. The syntax for creating a retention policy is:
CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT]

To keep data for 5 days and purge it automatically after 5 days, use:
CREATE RETENTION POLICY fivedays ON test DURATION 5d REPLICATION 1

To insert data using the default retention policy:
> insert user_logins,service=payroll,region=use-west value=1

To insert data using a specific retention policy:
>insert into fivedays user_logins,service=payroll,region=use-east value=5

To select the data point just inserted, you need to prefix the measurement name with the retention policy name:
> select * from user_logins
name: user_logins
time                    region          service value
----                    ------          ------- -----
1484127510987890238     use-west        payroll 1

> select * from fivedays.user_logins
name: user_logins
time                    region          service value
----                    ------          ------- -----
1484127751197523399     use-east        payroll 5

To make a retention policy default, create it with “default” clause:
 
> drop retention policy fivedays on test
(data rentented with this policy will be dropped too)

> CREATE RETENTION POLICY fivedays ON test DURATION 5d REPLICATION 1 default

Or use alter command:
alter retention policy fivedays on test default

show shards command will show retention policies on each database:
name: test
id      database        retention_policy        shard_group     start_time              end_time                expiry_time             owners
--      --------        ----------------        -----------     ----------              --------                -----------             ------
125     test            autogen                 125             2016-09-19T00:00:00Z    2016-09-26T00:00:00Z    2016-09-26T00:00:00Z
695     test            fivedays                695             2017-01-11T00:00:00Z    2017-01-12T00:00:00Z    2017-01-17T00:00:00Z


How to insert a data point with a specific timestamp?

In the previous example, “time” field contains the value when the data point was inserted, how to insert a data point with a specific timestamp?

> insert user_logins,service=payroll,region=use-west value=1 1478177907371000064
> select * from user_logins
name: user_logins
time                    region          service value
----                    ------          ------- -----
1478177907371000064     use-west        payroll 1

Note, there is a whitespace between the last field and the time field, and the time field value should a Unix nanosecond. 

> insert user_logins,service=payroll,region=use-west value=1 2016-11-03T12:58:27.371000064Z
ERR: {"error":"unable to parse 'user_logins,service=payroll,region=use-west value=1 2016-11-03T12:58:27.371000064Z': bad timestamp"}

Any timestamp field should be in the format of Unix nanosecond:

> insert user_logins,service=payroll,region=use-west value=1,finishtime=1478177907371000064 1478177907371000064
> select * from user_logins
name: user_logins
time                            finishtime              region          service value
----                            ----------              ------          ------- -----
2016-11-03T12:58:27.371000064Z  1.478177907371e+18      use-west        payroll 1

Other timestamp fields seem to stored as integers, it is not difficult to convert them to timestamps after retrieving the data.

What is one-line format?

The example in the InfluxDB documentation makes it clear:









Measurement and tag fields are separated with comma, non-tag fields are separated with whitespace.
It is a convenient way to send data to influxdb, e.g. if telegraf is installed, the following command will send a data point to inflxudb:

echo "user_logins,service=payroll,region=us-south:2|c" | nc -C -w 1 -u localhost 8125

How to drop all measurements?

drop measurement #NAME# will drop one measurement, but it is not possible to drop all measurements at one time, but this command can achieve the effect:
drop series from /.*/

What is a series?

A series is a collection of data that have the same measurement, tags and retention policy.
> show series from user_logins
key
---
user_logins,host=exp,metric_type=counter,region=us-south,service=payroll
user_logins,region=use-east,service=payroll
user_logins,region=use-west,service=payroll
(the output doesn’t show the retention policy active on the series)



How to output timestamp in a human-readable way?

> precision rfc3339
> select * from user_logins
name: user_logins
time                            region          service value
----                            ------          ------- -----
2017-01-11T09:38:30.987890238Z  use-west        payroll 1

To reverse back:
> precision ns
 


What is a continuous query?

A continuous query creates a measurement, which InfluxDB will insert data into at the group by frequency.

>CREATE CONTINUOUS QUERY "hourly_schedule" on "test"
begin
select count("value") into "hourly_schedule"
from "schedules"
group by time(1h)
end

The above statement creates a measurement "hourly_schedule", as data inserts into "schedules", InfluxDB will down-sample the data and insert it into "hourly_schedule".

This only works for new data inserted into "schedules". For existing data, you just write an ordinary select … into … statement:

select count("value") into "hourly_schedule" from "schedules" where time>0 group by time(1h)

where time>0 is useless here, but InfluxDB throws an error saying “ERR: error parsing query: aggregate functions with GROUP BY time require a WHERE time clause if you don’t provide a where time clause. 

Unlike a SQL in relation DB, InfluxDB “group by” doesn’t support having clause, but with the above created down-sampling measurements, you can achieve the same effect:

select * from  "hourly_schedule"  where count>5 limit 10

How to check the size of a DB?

> use _internal
Using database _internal
> select count(diskBytes) from "tsm1_filestore" where "database"='pcoe'
name: tsm1_filestore
time  count
----  -----
0     392666