navigation

Elasticsearch, Spring Boot and Angular.JS into action for building B2C web applications

Elasticsearch, Spring Boot and Angular.JS into action for building B2C web applications

by
June 9, 2015

Java technology stack for Silicon Valley web apps.

How the story begins? In brief – we were facing a Java web application with 5000 active users which were unavailable most of the time, most probably because of multiple memory leak issues. A script was running that restarts the Tomcat server when it becomes unavailable. The functionality was broken  – users could not login and use properly the rest of the functionality. More about the story here.

Technology stack and tools

After the inventory phase of the project we found quite impressive list:

Pic 1

Backend

  • Java 8
  • Spring (mvc, boot). Used for DI. Spring MVC is exposing REST services to Angular.JS and Spring Boot is used to setup the application rapidly
  • Elasticsearch – used for storing some searchable data.
  • MySQL
  • Stripe – Online payment provider with great API. You can easily switch in test mode and you can make json test queries through curl.
  • Opengraph – The Facebook protocol (set of meta tags on a html page) that enables any web page to become a rich object in a social graph
  • Twitter API
  • Redis – Clusterizable key-value cache and store

Frontend

  • Angular.js – JavaScript MVW Framework
  • Bootstrap – CSS for mobile-first front-end web development
  • Bower – Dependency manager for JavaScript apps
  • Wro – Web Resource Optimizer for Java web applications
  • Thymeleaf – Java XML/XHTML/HTML5 template engine

System, OS, Infrastructure

  • Rackspace – Cloud servers where the application is actually deployed
  • Cloudflare – Brings security and performance optimization to the web application
  • Linux Centos
  • Apache Tomcat

Dev tools

  • Maven
  • Vagrant – Virtualization of server environment for developers
  • IntelliJ IDEA

QA and monitoring

  • Uptimerobot – monitoring the service availability
  • Junit
  • Selenium
  • Logstash – elasticsearch based server to store log messages.
  • Kibana – interface for Logstash/Elasticsearch. Easily search and filter logs.
  • Jamon – Monitor for Java performance

User engagement, tracking, support

  • Optimizely – A/B testing framework for web and mobile apps
  • Uservoice – feedback and support system
  • Twitter – Used for providing support
  • Mixpanel – Web analytics. Every startup ‘must have’ tool
  • Mandrill – Managing mail/news lists, sending mails

Collaboration

  • Meldium – Online password manager. Easy to share credentials with the team
  • Bitbucket – Git repository and basic PM functionality
  • Google hangouts – for everyday communication

System architecture

When we start looking for the problems we found the following system architecture. We had:

  • 3 servers with running Apache Tomcat. One of them contains the web application and the other 2 were used in cluster for the Spring Quarts, which actually load the system.
  • 4 servers for the Elasticsearch cluster
  • 1 server for Logstash
  • 1 server for Redis

pic 2

The problem

The most important issue to fix was that users were unable to login. After some debugging, digging into the code and the systems we found the problem. It was the Elasticsearch itself.

Elasticsearch and ACID support

Elasticsearch was used as a database. It does not support ACID transactions. This means that when you put something (like Twitter authentication token) there is no guarantee that it is available for reading the next moment. What happens is that the authentication system tries to read the login token which is saved through a callback using another node. The token is not replicated in all the nodes, the authentication mechanism cannot find it and the user cannot login.

Memory issues

Most of the memory issues were related to the Spring message queue. It was trying to save data in the Elasticsearch, or calling some external services (geoip, etc). Those operations became very slow with increasing the system load. The queue was filled with messages which could not be dispatched and it started to take too much memory. Restarting the server cleans the queue but the data is lost.

The solution

We decided to store the data in MySQL server and leave only the statistic data into single node Elasticsearch server. One more reason to make this decision was that all the persistence model and ORM was implemented in the code which was harder to maintain than the JPA/Hibernate model.

The migration was not an easy process since we had to reimplement all the queries.

After the migration we have 1 server running Tomcat, Elasticsearch and Redis and 1 more for Logstash. With the same user’s load we have now 30% resource usage and 75% less servers which reduced the monthly expenses by 50%.

Lessons learned

  • Consider the technology stack. Using popular tools and frameworks is cool but developers have to be careful when putting them working together. Elasticsearch is a great tool, but it is designed for search not to serve as a database. We have some problems with the integration of Angular.JS and WRO on the front end side.
  • The rule of 3. At least 3 engineers should make an estimation of a task. We did make a mistake thinking that the migration from Elasticsearch to MySQL would be an easy process. If we had spent more time on estimating this could not be the case.
  • Set up a process. Thinking that it is a short task we couldn’t manage to setup a proper process. This led to unknown delivery terms, missed deadlines, communication overload and more.

Next steps

  • Implement stress test with Gatling so we’ll be able to predict when the system will crash under the user’s’ load.
  • Spring cache and Redis for session management. Store the session objects in Redis so we can add, remove tomcat nodes on demand.

 

Do you have you experience with the technologies listed above? Have you ever used Elasticsearch as a database?

 

Cvetelin Andreev

Cvetelin has been involved in startups (mostly tech) since 2003 year playing as (co-)founder, partner and occasionally Java full stack developer. Currently Full Stack Soldier, Startup activist and active blogger @ Dreamix. Plays, teaches and manages @ www.kabagaida.com. Founder of OfficeInTheWoods.com, fan of #futureofwork. Practice sustainable gardening and lifestyle. Runs a forest kindergarten near Sofia. Father of two.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle Plus

Do you want more great blogs like this?

Subscribe for Dreamix Blog now!

  • Anonymous

    Try Gradle instead of Maven – it’s working much faster..

    • Thanks for the advice! Will give it a try in my next Java adventure!

  • Very nice adventure indeed, I was just wondering if you ever tried using nginx cluster in-front of tomcat?

    Few years ago, had similar structure we used Cloudflare => Haproxy => Varnish => nginx => tomcat

    Thanks for Sharing

    • Hi Gabriel,
      It was a nice adventure and it is still ongoing. You can check twibble.io for the result.
      We are on the road of clustering Tomcat and nginx will be one of the options we’ll consider. Would you recommend it? How was your experience with it?

  • Adriano Rodrigues

    Such an useful post. I really liked your honesty about the whole process. I’m searching about elasticsearch and by reading your post, i’m discarding the idea of using it as a database. Thanks for sharing your experience.

    Also, i’m amazed at the number of techologies your team managed to work in harmony together.

    • StoyanMit

      Hi Rodrigo,

      Thank you for the time reading our article. Sometimes a product needs the mix of many different technologies to achieve what’s desirable.
      For sure don’t use elasticsearch as database.

      Best,
      Stoyan