navigation

Data visualization under 15 minutes with the ELK stack

Data visualization under 15 minutes with the ELK stack

by
May 2, 2017
BI, frontpage, Web and others
No Comment

Following the series of my previous blog posts regarding data visualization with Highcharts and AngularJS, this time I’ll show you how easily the same outcome can be accomplished with another software stack – the ELK (Elasticsearch, Logstash and Kibana).

As a person who loves sport events, the dataset which we are going to use is Premier League season 2011/2012 statistics.

Prerequisites

  1. Download Elasticsearch
  2. Download Logstash
  3. Download Kibana
  4. Git clone https://github.com/boykodimitroff/ELK-blogpost.git

Let’s start

If we talk in the context of Business Intelligence then Logstash is the ETL(extract, transform, load) tool of the ELK stack. It needs to be configured to read our dataset, transform the different fields to proper data types and at the end to load everything into Elasticsearch(our data warehouse instance).

Let’s check the csv.conf file from the cloned project.

 

Since our dataset is in csv format we need to configure a csv filter in order Logstash to know how to read the data. The columns property indicates the column names of the csv or the so called header. Later these field names will be used in our queries. The mutate filter will tell Logstash how to treat each value under each column. Finally the output configuration will ensure that the data will be loaded into Elasticsearch localhost instance under index called csv_index.

Let’s run Elasticsearch and then Logstash with the created configuration file:

  • ./elasticsearch
  • ./logstash -f /path/to/csv.conf

If everything is okay, Logstash will start the process of extracting the data from the csv file, transforms the values according to the specified schema and then loads everything in Elasticsearch.

Alright, now the data is loaded, let’s visualize it. Kibana’s what will do this in our ELK stack. I assume you already started it, so go to http://localhost:5601

Our data has “Date” property which represents the time period of a particular match. Every value in this field is a date between 2011 and 2012 because we have the statistics for Premier League season 2011/2012. Quick search on Google tells me that the season started on August 13, 2011 and ended on May 13, 2012.

By default Kibana shows records for the “Last 15 minutes”, so we need to change the time period.

Click on “Last 15 minutes” in the upper right corner of Kibana’s dashboard. Then select “Absolute” time range and change the calendar to represents the above mentioned dates.

We are going to prepare charts with the following three indicators:

  • Goals per team with Pie chart
  • Top 10 scorers with Bar chart
  • Comparison between red and yellow cards per footballer with Line chart

Goals per team

Select Visualize ->Create a visualization -> Pie chart. Click on our newly created index csv_index. You should see something like this:

kibana pie blue

Since we want to visualize goals per team we need to select Sum from Aggregation drop down. After that Goals needed to be marked as a Field. Click on Split Slices and select Terms for Aggregation. A terms aggregation enables you to specify the top or bottom n elements of a given field to display, ordered by count or a custom metric. Our term field will be Team. We want to show every team in the league so we need to increase the default size from 5 to 20. At the end click on the play button in the header of the input controls section. The result should be this:

kibana pie color

Manchester city scored the most goals in season 2011/2012 of the Premier League.

Top 10 scorers

Select Visualize ->Create a visualization -> Vertical bar chart. Click on our csv_index. You should see something like this:

kibana blue

Expand the Y-axis and select Sum for Aggregation. The selected Field value must be Goals. Add X-axis with Terms for Aggregation based on Player Surname field. We need the top 10 scorers so the default size should be increased to 10. Click on the play button. The result should be this:

kibana charts

Looks like Van Persie is the top scorer of the Premier League 11/12 with 28 goals.

Comparison between red and yellow cards per footballer – easy with the ELK stack

Select Visualize ->Create a visualization -> Line chart. Click on csv_index. You should see a blank chart with the already familiar input controls.

Expand Y-axis and select Sum for Aggregation based on Red cards field. Click on Add metrics button and add another Y-axis with Sum for Aggregation based on Yellow cards field. Add X-axis with Terms for Aggregation based on Player Surname. Increase the default size to 11 in order to see a whole team. Since we know that Manchester City scored the most goals in the league, let’s narrow down our results just to this team. In the search header replace the * with Team:=Manchester City.

kibana graph

 

Conclusion

It’s amazing what can be achieved with the ELK stack in such a short period of time. No needs to install heavy reporting platforms or database instances for data warehousing. Data modeling and loading is accomplished with such an ease. I love it.

Boyko Dimitrov

Java Developer at Dreamix

More Posts - Website

Follow Me:
TwitterLinkedInGoogle Plus

Do you want more great blogs like this?

Subscribe for Dreamix Blog now!