Logstash is a real-time data processing engine that uses pipelines that receive, transform, and send data to different destinations. For more details about Logstash and how it is used, you should check my previous article HERE.
In this one, we will talk about some tips and good practices to make your life easier when using the tool, because if you have a big project, things could quickly become messy and hard to manage.
Consider that you have a snippet of code that you need to reuse across multiple pipelines, allowing you to follow the DRY principle and simplify modifications to your pipelines for any reason. In this case, you need either to create a Logstash plugin and host it on the local Logstash gems registry.(Doc)
Another easier method, especially if you need your script on one Logstash project and not across multiple Logstash projects, is to use Ruby scripts.
Let’s take an example:
Let’s imagine a Logstash server that receives logs from other servers using filebeat on port 5044, and it checks the status of the service that is running on those servers and sends it to Elasticsearch.
So, a basic example of such pipelines will be:
Now, if we want to reuse the code in the filter part, we can create a Ruby script file check_status_name.rb:
Note that we changed the script code to implement it just with Ruby code. For example, instead of the “drop” plugin, we can use event.cancel because, in Ruby plugin’s code, we should just manipulate the event object.
Another interesting approach is to break your pipeline into multiple files. Each file contains a part of the pipeline; for example, one file for input, N files for the filter, and a last one for the output.
To do this, you should name your files with the same order that you want Logstash to assemble them, and give Logstash the directory where your files are.
Example:
📁/etc/logstash/conf.d/pipeline-date-parser
├── 1_Input.conf
├── 21_Filter_null_events.conf
├── 22_Format_load_date.conf
└── 3_Save_into_file.conf
Lastly, we need to add the pipeline in the pipelines.yml file the following.
The importance of this is that we can easily reuse code by duplicating our shared files in multiple pipelines without copy-pasting code, and separate the standard parts of code from the custom parts by putting the custom code in a separate file and limiting modification of the standard blocks.
Let’s take the latest example and say that we want to add to send events to elasticsearch, so we need just to add a new file “4_save_to_elasticsearc.conf” and add our output in it and add it to the directory.
It should be useful also for complex pipelines to be separated to simplify their maintenance.
What I mean by logs are the technical logs of Logstash; when the application is running by default, it emits logs in a single file.
By default the path could be:
/var/log/logstash/logstash-plain.log or <LOGSTASH_HOME>/logs/logstash-plain.log depending on the version and where you installed it.
So we can change it with the parameter: in logstash.yml file:
Or with the argument: --pipeline.separate_logs
For a pipelines.yml file like this:
The log files will be like this:
/var/log/logstash/pipeline_pipeline1.log
/var/log/logstash/pipeline_pipeline2.log
So separating logs can be useful when you have many pipelines and you look for specific messages / errors of a single pipeline of them.
Another tip exploring logs is that in order to know if a pipeline has started or not, we can search for the message “Pipeline started” with the pipeline id in logs (log level: info).
Looking for an error or a message in Logstash log files may be very fastidious, especially if you search on a big log file.
You can instead install a Filebeat agent on your Logstash server and parameter this agent to send logs to an Elasticsearch, so that you can benefit from the power of Elastic Dashboard (also known as Kibana) to make queries and dashboards.
In filebeat.yml file you need to specify the log file path:
For the output part:
Logstash API is another way to monitor your Logstash service and make sure everything is ok.
It exposes multiple endpoints such as:
The health report API (Health report API) can be used to get indicators about how Logstash pipeline running.
Also the stats API (Node stats API) gives more detailed information about the pipelines, the number of events received and sent, the errors on plugins,… etc
As a Logstash user, I need to think of the resiliency aspect of the service. Queues can be used for this need.
A Logstash persistence queue is a mechanism that holds data in the local file system temporarily if Logstash can’t deliver it and resumes sending this data once the issue is resolved. The issue can be on the filter or output side like a loss of connection to an Elasticsearch on the output.
So it can replace tools that are used for this same need like Apache Kafka.
To activate the persistence queue, you should update the settings:
In logtstash.yml:
Those were some of the tips you can think of when using Logstash, I hope you learned something new.
Don’t hesitate to check the other articles of my Blog ;)
Stay safe and Keep learning..