As we discussed in the previous blog 2016 markеd the 6th year of Dreamix successful partnership with one of our major clients – a leading international full-service provider of payment solutions and mobile value added services. Due to our common efforts and productive collaboration Dreamix is now widely involved in the development of new analytics and reporting features, the deploying system releases as well as supporting and customizing the existing functionality. The solution continues to processes data from over 10 app store markets (including Google, Facebook, Microsoft, Sony) from over 20 end client telecoms around the world, in more than 15 countries.
Over the past years, a custom hybrid platform solution has been implemented using Pentaho Data Integration (Kettle) for the ETL tasks, Pentaho Online Analytical Processing (Mondrian OLAP server) and Jaspersoft Open Source Business Intelligence software for building a web portal and parameterized reports. As thе work progressed the team faced series of challenges. Apart from collecting and analyzing large amounts of current and history payment messages, we need to deal with the strict company application standards and the creation of multiple roles, supported with different levels of data access and functionality. So, find here our approach and used technology stack in pursuit of creating reliable BI solution for payment data and mobile value added services.
Data warehouse (DW) is designed and implemented as a central repository of integrated data. Several disparate sources including databases, CSV files, call detail records (CDRs), transaction server log files, configuration files and others are being extracted, transformed and loaded into the DW each night as part of ETL scheduled jobs. Once loaded into the DW, the information is clean and unified. Data from each telecom is organized in separate sets of tables partitioned upon yearmonth dimension.
- Data marts
Specialized ETL jobs are involved in processing subsets of data from the DW and loading the result into several data marts following the star schema design. These data marts are used mainly to serve as a data source of Pentaho OLAP cubes, each per telecom, including a fact table, private and shared dimension tables. Additionally, for faster query processing, subsets of each fact table data are preprocessed and loaded into more than 80 aggregation tables per OLAP cube.
- Regional overnight processing
While each telecom data is collected and processed in Germany, different telecoms are headquartered and operate in different geographic regions with different time zones. To provide the telecom users around the world an updated analytic data every morning, telecom data is processed at different server time of the day (regional overnight processing). All data processing takes place on a separate, staging database environment. To minimize processing time, cubes’ fact tables, dimension tables, and supporting aggregation tables are updated incrementally by massive parallel query execution. Data is first copied from DW to the staging area servers, where it is processed, and then copied to Cube database servers into temporary “to swap” tables. As soon as processing is finished and all new/updated tables are prepared, the current production cube tables are replaced with the new tables all at once, avoiding data inconsistency.
- OLAP and in-memory cachingMondrian
OLAP Cubes are developed per each telecom to allow multidimensional data queries (MDX) used in Pivot tables and MDX-based charts and reports. Additionally, for better performance, Mondrian OLAP in memory caching has been enabled and an automatic caching job has been implemented and executed automatically when the cube data is updated. The automatic caching job first flushes the existing cached data and then makes rest API calls executing hundreds of queries responsible for cache different sections of each cube.
- Realtime monitoring
Apart from business data analytics, that can be updated not more than just a few times per day, a separate module for “realtime” monitoring has been implemented. It monitors the systems on the transport layer, ensuring all services are operating as normal and data is being generated.This information is updated every few minutes and indicates the incoming and outgoing transaction rate (transactions per second) for each telecom and/or app store, the number of errors, the response time delay and other SLA based KPIs.
The frontend part of the solution is implemented using the community version of the Jasper Open Source Business Intelligence server. Custom dashboard views including charts, tables and several types of gadgets are designed and implemented for the internal and the external users of the system. The users of each telecom are provided with customizable OLAP views (Pivot tables) allowing them to sort, slice & dice, filter and export the data. Parametrized Jasper reports are implemented allowing on demand execution as well as scheduled automatic execution.
Technologies and tools used
- Servers OS: Red Hat Linux;
- Database: PostgreSQL;
- ETL: Pentaho Data Integration (Kettle);
- OLAP: Pentaho OLAP (Mondrian OLAP server), Schema Workbench, Jaspersoft OLAP, Custom Java-based aggregation schema generator, Custom Python-based caching script;
- Dashboard and Reporting: Jaspersoft Open Source Business Intelligence, iReport Designer;
- BI Server: Jaspersoft Open Source Business Intelligence;
- Web portal: Jaspersoft Open Source Business Intelligence.