These are the questions that I once had when I was working
on Logstash to analyze my application’s logs, I wish there were a compiled list
of common questions and answers, which could have saved me many hours of
googling and experimenting. This cookbook works for Logstash 1.4.2.
No such command "-e"
The HelloWorld example in http://logstash.net/docs/1.4.2/tutorials/getting-started-with-logstash
doesn’t work in windows:
c:\logstash-1.4.2\data>..\bin\logstash -e 'input { stdin { } } output { stdout {} }'"LS_HOME=" c:\logstash-1.4.2No such command "-e"Usage: logstash <command> [command args]Run a command with the --help flag to see the arguments.
-e doesn’t work in windows, it works in linux. But
HelloWorld is not that interesting after all, there is no need crying over it.
To try out Logstash in windows, write up your actions in a configuration file,
and use logstash
agent -f configfile.
Logstash has a simple structure: it reads from loggings from
various inputs, pass them down through filters for all kinds of processing, and
send them to different outputs. This page http://logstash.net/docs/1.4.2/
lists the inputs, filters and outputs currently supported.
How to drop a logging from the filter chain?
Use drop filter which will drop the logging from
the filter chain and from outputs. For example, the following tries to match a
logging to a grok pattern %{LOGPATTERN}, if doesn’t match, it drops the
logging:
grok {match => { "message" => "%{LOGPATTERN}" }tag_on_failure => ["logging not confirm to %{LOGPATTERN}"]}if "logging not confirm to %{ LOGPATTERN }" in [tags] {drop { }}}
How to skip the rest of filters and only to one output?
If there are multiple drops, I won’t be able to
know which drop
drops the logging, it would be nice if instead of drop, there is a filter that
skips the rest of filters and flow directly to stdout output.
To my knowledge, there is no such filter. And although I
have little knowledge about the inner workings of Logstash, it seems not an
easy task.
How to debug Logstash itself?
I use three techniques.
- Use –vvv to output most verbose Logstash debug information.
bin\logstash agent -f conf/test1.conf –vvv
- Use stdout to output results:
output {stdout { codec => rubydebug }}
- Use -t to test the config file
Why sometimes Logstash doesn’t seem to be doing anything when processing a static file?
If you use file input to process a static file,
you might notice that the first time you run Logstash, everything works ok, you
see results outputted in stdout
(if you configured stdout), but
the second time you run Logstash, Logstash doesn’t seem to be doing anything,
it seems to be stopped and wait silently for something to happen.
This is because file
records the place where it finished processing the file last time and starts
from there, it is similar to “tail -0F”.
So if the size of the file doesn’t increase after the last time, Logstash will
do nothing: it is waiting for new things to be written into the log file. File writes the last pace to a sincedb file.
This is quite inconvenient in testing, as in testing, one is
usually experimenting different things with the same log file and wants to
rerun Logstash to verify. To overcome this, specify the sincedb file and before running Logstash, manually delete
the sincedb file.
input {file {path => "C:/logstash-1.4.2/data/test.log"start_position => "beginning"sincedb_path => "C:/logstash-1.4.2/data/.test.log.sincedb"}}
Logstash can't tail multiple files in windows?
This is a bug in Logstash https://logstash.jira.com/browse/LOGSTASH-2229,
which has been solved https://github.com/jordansissel/ruby-filewatch.
To patch Logstash 1.4.2 with the solution:
- Download ruby-filewatch-master.zip from https://github.com/jordansissel/ruby-filewatch
- Unzip it
- Copy files under ruby-filewatch-master\lib\filewatch to logstash-1.4.2\vendor\bundle\jruby\1.9\gems\filewatch-0.5.1\lib
- copy files under ruby-filewatch-master\bin to logstash-1.4.2\vendor\bundle\jruby\1.9\gems\filewatch-0.5.1\bi
- copy ruby-filewatch-master\java\JRubyFileExtension.jar into logstash-1.4.2\lib
Logstash mixes loggings from different files in Windows?
This is another bug of Logstash when running in the charming
Windows. Some nice person has posted a temporary solution in https://logstash.jira.com/browse/LOGSTASH-2207,
two files need to be changed:
- logstash-1.4.2\lib\logstash\inputs\file.rb
- logstash-1.4.2\lib\logstash\codecs\ multiline.rb
How to check if a field exists or not?
#check if existsif [event]{}#check if not existsif ![event]{}
How to check if a tag exists
tag is stored
in an array, so to check if a specific tag exists, use in:
if "not a cluster lost event" in [tags] {mutate {add_field => { "event" => "other events" }}}
How to do case insensitive comparison?
Logstash is written in Ruby, and Ruby supports regex
insensitive mode in the form of /regex/i, such as:
however, Logstash doesn’t support this form, for example, Logstash will complain the following configuration is illegal:if "ABC" =~ /abc/iputs "true"end
Instead, use (?i) form to do case insensitive comparison:if [message]=~ /abc/i {}
if [message] =~ /(?i)abc/ {}
How to capture named regex without defining them in grok pattern files?
Sometimes you do not want to define named regex in a grok
pattern file, perhaps because it is not convenient to change the grok pattern
file every time the named regex changes, you can use (?<name>regex). For example, if you see a
logging “fail to hear from k1
for over 3 minutes”, you know an event “cluster failed-lost k1” has happened, to do so:
grok {match => { "message" => "fail to hear from (?<nodelost>.*?) for over 3 minutes" }add_field => { "event" => "cluster failed-lost %{nodelost}" }tag_on_failure => ["not a cluster lost event"]}
You might be tempted to use this technique directly in if comparison. The
comparison will be true, but you won’t get the named capture:
if [message] =~ /fail to hear from (?<nodelost>.*?) for over 3 minutes/ {mutate {add_field => { "event" => "cluster failed-lost %{nodelost}" }}}
The event field you get is
"event" => "cluster failed-lost %{nodelost}"
This is because the captured %{nodelost} is not passed
into the mutate
plugin. In contrast, the named captures
from grok
becomes fields,
which are passed down the filter chain and to the outputs.
How to debug grok patterns?
http://grokdebug.herokuapp.com/
is a great tool to debug grok patterns, however, at first, it is not that easy
to figure out how to use it, so I snapped a screenshot which should make it
clear.
How to add fields?
Many filters can create fields. I usually use grok
and mutate.
- grok:
The named patterns in grok will automatically
become fields, which will passed down the filter chain and be used.
- mutate:
mutate has many operations that can transform a logging,
so I use it often.
mutate gsub doesn’t evaluate fields?
This is an issue https://github.com/elasticsearch/logstash/issues/1529,
which has been solved in https://github.com/elasticsearch/logstash/pull/1552/files.
You can patch up logstash-1.4.2\lib\logstash\filters\mutate.rb with the
solution.
In my opinion, the solution is not complete though. With
this solution, gsub will evaluate fields in the replacement part, but not the
grok patterns in the matching part. For example:
#this doesn’t work: gsub doesn’t evaluate %{JAVAEXCEPTION} – the string matching partgsub => [ "msg", "%{JAVAEXCEPTION}", "%{javaexception0} %{frommyapp}"]
#this works: gsub evaluates fields in the replacementgsub => [ "msg", "(?:[a-zA-Z0-9-]+\.)+[A-Za-z0-9$_]+Exception", "%{javaexception0} %{frommyapp}"]
How to reference captured groups in gsub?
Use \1, \2.For example, to change a timestamp from 2016-05-03-06:22:36.641Z to 2016-05-03T06:22:36.641Z, use:
gsub => [“timestamp","-(\d\d):","T\1:"]
How to deal with multiline loggings?
A logging containing multiple lines is quite often, think of
a stacktrace. You can use multiline
codec to deal with multiline loggings. For example, typically a logging is
logged with a timestamp, which follows a specific pattern. If a timestamp
following the pattern doesn’t exist in a line, then the line is not the
beginning of a new logging: it belongs to the previous logging.
#define the logging timestamp patternYEARMONDAY %{YEAR}[/-](%{MONTHNUM})[/-](%{MONTHDAY})TZ0 (?:[a-zA-Z]{3,4})LOGDATE %{YEARMONDAY}-%{TIME} %{TZ0}#within input, use multiline codecinput {file {path => "C:/logstash-1.4.2-0/data/test/**/serverLog.txt"start_position => "beginning"sincedb_path => "C:/logstash-1.4.2-0/data/.test.log.sincedb"codec => multiline {pattern => "%{LOGDATE}:"negate => truewhat => previous}}}
Why does Logstash ElasticSearch output create indices with wrong dates?
By default, for each
logging passing from inputs, through filters and out to outputs, Logstash
creates a default @timestamp field, which is the time that Logstash processes
the logging. Logstash will also use the date part of the @timestamp
to create an ElasticSearch index. For example, the test.log file contains two
loggings:
ERROR server:ClusterMonitor:2013/10/19-23:51:09.637 CDT: fail to hear from k2 for over 3 minutesERROR server:ClusterMonitor:2013/10/19-23:51:39.792 CDT: fail to hear from k3 for over 3 minutes
Without any filters, Logstash will create these two
documents:
{"message" => "ERROR server:ClusterMonitor:2013/10/19-23:51:09.637 CDT: fail to hear from k2 for over 3 minutes\r","@version" => "1","@timestamp" => "2014-11-08T10:17:52.605Z","host" => "Apple","path" => "C:/logstash-1.4.2-0/data/test.log"}{"message" => "ERROR server:ClusterMonitor:2013/10/19-23:51:39.792 CDT: fail to hear from k3 for over 3 minutes\r","@version" => "1","@timestamp" => "2014-11-08T10:17:52.605Z","host" => "Apple","path" => "C:/logstash-1.4.2-0/data/test.log"}
Notice @timestamp field is the time Logstash
processes the logging. Logstash will also create ElasticSearch index named logstash-2014.11.08. This is most likely not what you want, you
want to see index “logstash-2013.10.19” containing the two loggings logged
on 2013/10/19.
To accomplish this, you need to capture the timestamp from
the loggings (this can be done via a grok pattern, or via the technique
described in “how to capture named regex”), and use date filter to make
the timestamp as the timestamp for the document and for the index name.
- Capture the timestamp from the loggings
Use the same example, using the existing
grok pattern, it is easy to create a pattern to match logging timestamps such
as “2013/10/19-23:51:09.637
CDT”:
#define the patternYEARMONDAY %{YEAR}[/-](%{MONTHNUM})[/-](%{MONTHDAY})TZ0 (?:[a-zA-Z]{3,4})LOGDATE %{YEARMONDAY}-%{TIME} %{TZ0}#use grok to capture timestamp from loggingsgrok {match => { "message" => "%{LOGDATE:timestamp}" }tag_on_failure => ["logging not confirm to %{PPMLOG} pattern"]}
- Use date filter
date{match => [ "timestamp" , "yyyy/MM/dd-HH:mm:ss.SSS z" ]}
date match takes the “timestamp” field, which
follows the format “yyyy/MM/dd-HH:mm:ss.SSS z” -- this is very inconvenient: you just had to
figure out LOGDATE regex format using a regex pattern, now you had to further
figure out a date format following a different standards in joda.time.format.DateTimeFormat.
It seems Logstash is only able to recognize some North
American timezones and fails to deal with Asia and European timezones.
For example, “CST” can be China Standard Time or Central
Standard Time, Logstash will treat is Central Standard Time. To properly
interpret a timestamp string with a timezone, you need to:
- First capture the timezone part using Grok pattern, for example:
LOGDATE %{YEARMONDAY}-%{TIME} %{TZ0:tz}
- Remove the timezone part from the timestamp string
mutate{gsub => ["timestamp","\s*[A-Za-z]+",""]}
- Use timezone to specify a desired timezone, and add the timezone part back to the timestamp string. The supported timezones are in http://joda-time.sourceforge.net/timezones.html .
if [tz]=~ /CST/{date{timezone=>"Asia/Shanghai"match => [ "timestamp" , "yyyy-MM-dd-HH:mm:ss.SSS" ]}mutate{gsub => ["timestamp","$"," %{tz}"]}}
Now the output becomes:
{"message" => "ERROR server:ClusterMonitor:2013/10/19-23:51:09.637 CDT: fail to hear from k2 for over 3 minutes\r","@version" => "1","@timestamp" => "2013-10-20T04:51:09.637Z","host" => "Apple","path" => "C:/logstash-1.4.2-0/data/test.log","timestamp" => "2013/10/19-23:51:09.637 CDT}{"message" => "ERROR server:ClusterMonitor:2013/10/19-23:51:39.792 CDT: fail to hear from k3 for over 3 minutes\r","@version" => "1","@timestamp" => "2013-10-20T04:51:39.792Z","host" => "CHENPING4","path" => "C:/logstash-1.4.2-0/data/test.log","timestamp" => "2013/10/19-23:51:39.792 CDT"}
Notice @timestamp field now has the same value as "timestamp",
but they are of different timezones.
@timestamp is in UTC timezone.
@timestamp is in UTC timezone.
The index created is named logstash-2013.10.20.
Note ElasticSearch seems only able to recognize North American timezones as well, and worse still, ElasticSearch creates indices but fails to create documents silently, without any errors in ElasticSearch log files or error responses returned to Logstash. I used to have a field timestamp mapped to date type, if the timestamp is, for example, of EDT (an European timezone), no documents will be created even though indices were created. In the end, I gave up on timestamp field being date type and return it to String type.
Note ElasticSearch seems only able to recognize North American timezones as well, and worse still, ElasticSearch creates indices but fails to create documents silently, without any errors in ElasticSearch log files or error responses returned to Logstash. I used to have a field timestamp mapped to date type, if the timestamp is, for example, of EDT (an European timezone), no documents will be created even though indices were created. In the end, I gave up on timestamp field being date type and return it to String type.
How to process only a part of the log files?
Sometimes I got a huge log file spanning many months, while I
am only interested in getting the most recent month’s loggings. I can do so by filtering
loggings that are logged before a certain time.
mutate {add_field => { "startdate" => "2014-07-01T00:00:00 GMT" }}date {//convert startdate from a string to a datematch => [ "startdate" , "yyyy-MM-dd'T'HH:mm:ss z" ]target => "startdate"}if [@timestamp] < [startdate] {//drop loggings if they are logged before the startdatedrop{}}
How to create ElasticSearch indices with different name patterns?
By default, Logstash creates indices with the name pattern logstash-%{+YYYY.MM.dd},
such as logstash-2013.10.20,
you can change it. For example, I analyze log files from different customers,
every customer has multiple nodes. I change the name pattern to be:
output {elasticsearch {host => localhostcluster => AppleElastic1index => "logstash-%{customer}-%{node}-%{+YYYY.MM.dd}"}}
Note the index name has to be lowercase, so the fields customer
and node
has to be lowercased. You can use mutate to do lowercasing:
mutate {lowercase => [ "customer" ,”node”]}
How to overwrite ElasticSearch documents?
By default, ElasticSearch uses a random string as the id for
a document, so if you rerun Logstash to process the same log files, the
original documents won’t be overwritten.
To overcome this, you can specify the id, typically, you
should use the timestamp of the logging, for example:
elasticsearch {host => localhostcluster => AppleElastic1index => "logstash-%{customer}-%{node}-%{+YYYY.MM.dd}"document_id=>"%{timestamp}"}
How to change the default ElastiSearch mapping?
Logstash ElasticSearch uses dynamic mapping which is defined
in logstash-1.4.2\lib\logstash\outputs\elasticsearch\elasticsearch-template.json,
this dynamic mapping often is not ideal, for example, although in “Why does
Logstash ElasticSearch output create indices with wrong dates?”, we’ve managed
to get logging timestamps that look like they are dates, they are actually
stored as strings. Checking the mapping of the created indices, you will see:
"@timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"timestamp": {
"norms": {
"enabled": false
},
"type": "string",
"fields": {
"raw": {
"index": "not_analyzed",
"ignore_above": 256,
"type": "string"
}
}
}
You will see the default built-in @timestamp is of date
type, while the timestamp is of string type.
Another thing I want to change is that for some fields, I do
not want them to be analyzed, for example, event and subevent
fields, I often need to do aggregation searches on them, if they are analyzed,
the result will be wrong. Although I can use event.raw and subevent.raw
fields (which are not analyzed, as the above example shows) as an
alternative, it is not convenient, and it is a waste of resources to have .raw
fields.
To overcome, copy the default mapping file logstash-1.4.2\lib\logstash\outputs\elasticsearch\elasticsearch-template.json
and make changes to it, in my case:
{"template" : "logstash-*","settings" : {"index.refresh_interval" : "5s"},"mappings" : {"_default_" : {"_all" : {"enabled" : true},"dynamic_templates" : [ {"string_fields" : {"match" : "*","match_mapping_type" : "string","mapping" : {"type" : "string", "index" : "analyzed", "omit_norms" : true,"fields" : {"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}}}}}],"properties" : {"@version": { "type": "string", "index": "not_analyzed" },"geoip" : {"type" : "object","dynamic": true,"path": "full","properties" : {"location" : { "type" : "geo_point" }}},"timestamp" : {"format" : "yyyy/MM/dd-HH:mm:ss.SSS z","type" : "date"},"event": {"type": "string", "index": "not_analyzed" },"subevent": {"type": "string", "index": "not_analyzed" }}}}}
And use this customized mapping in ElasticSearch output:
elasticsearch {host => localhostcluster => AppleElastic1document_id=>"%{timestamp}"template=>"../data/conf/logstash-template.json"}
Before you run Logstash, make sure that you’ve deleted the
existing mapping.
#check the existing mapping:curl -XGET http://localhost:9200/_template/logstash?pretty#delete the existing mappingcurl -XDELETE http://localhost:9200/_template/logstash?pretty
No comments:
Post a Comment