Elasticsearch, Spring Boot and Angular.JS into action for building B2C web applications
Java technology stack for Silicon Valley web apps. How the story begins? In brief – we were facing a Java web application with 5000 active users which were unavailable most of the time, most probably because of multiple memory leak issues. A script was running that restarts the Tomcat server when it becomes unavailable. The functionality […]
Java technology stack for Silicon Valley web apps.
How the story begins? In brief - we were facing a Java web application with 5000 active users which were unavailable most of the time, most probably because of multiple memory leak issues. A script was running that restarts the Tomcat server when it becomes unavailable. The functionality was broken - users could not login and use properly the rest of the functionality. More about the story here.
Technology stack and tools
After the inventory phase of the project we found quite impressive list:
Backend
Java 8
Spring (mvc, boot). Used for DI. Spring MVC is exposing REST services to Angular.JS and Spring Boot is used to setup the application rapidly
Elasticsearch - used for storing some searchable data.
MySQL
Stripe - Online payment provider with great API. You can easily switch in test mode and you can make json test queries through curl.
Opengraph- The Facebook protocol (set of meta tags on a html page) that enables any web page to become a rich object in a social graph
Meldium - Online password manager. Easy to share credentials with the team
Bitbucket - Git repository and basic PM functionality
Google hangouts - for everyday communication
System architecture
When we start looking for the problems we found the following system architecture. We had:
3 servers with running Apache Tomcat. One of them contains the web application and the other 2 were used in cluster for the Spring Quarts, which actually load the system.
4 servers for the Elasticsearch cluster
1 server for Logstash
1 server for Redis
The problem
The most important issue to fix was that users were unable to login. After some debugging, digging into the code and the systems we found the problem. It was the Elasticsearch itself.
Elasticsearch and ACID support
Elasticsearch was used as a database. It does not support ACID transactions. This means that when you put something (like Twitter authentication token) there is no guarantee that it is available for reading the next moment. What happens is that the authentication system tries to read the login token which is saved through a callback using another node. The token is not replicated in all the nodes, the authentication mechanism cannot find it and the user cannot login.
Memory issues
Most of the memory issues were related to the Spring message queue. It was trying to save data in the Elasticsearch, or calling some external services (geoip, etc). Those operations became very slow with increasing the system load. The queue was filled with messages which could not be dispatched and it started to take too much memory. Restarting the server cleans the queue but the data is lost.
The solution
We decided to store the data in MySQL server and leave only the statistic data into single node Elasticsearch server. One more reason to make this decision was that all the persistence model and ORM was implemented in the code which was harder to maintain than the JPA/Hibernate model.
The migration was not an easy process since we had to reimplement all the queries.
After the migration we have 1 server running Tomcat, Elasticsearch and Redis and 1 more for Logstash. With the same user’s load we have now 30% resource usage and 75% less servers which reduced the monthly expenses by 50%.
Lessons learned
Consider the technology stack. Using popular tools and frameworks is cool but developers have to be careful when putting them working together. Elasticsearch is a great tool, but it is designed for search not to serve as a database. We have some problems with the integration of Angular.JS and WRO on the front end side.
The rule of 3. At least 3 engineers should make an estimation of a task. We did make a mistake thinking that the migration from Elasticsearch to MySQL would be an easy process. If we had spent more time on estimating this could not be the case.
Set up a process. Thinking that it is a short task we couldn’t manage to setup a proper process. This led to unknown delivery terms, missed deadlines, communication overload and more.
Next steps
Implement stress test with Gatling so we’ll be able to predict when the system will crash under the user's’ load.
Spring cache and Redis for session management. Store the session objects in Redis so we can add, remove tomcat nodes on demand.
Do you have you experience with the technologies listed above? Have you ever used Elasticsearch as a database?
Cvetelin has been involved in startups (mostly tech) since 2003 year playing as (co-)founder, partner and occasionally Java full stack developer. Currently Full Stack Soldier, Startup activist and active blogger @ Dreamix. Plays, teaches and manages @ www.kabagaida.com. Founder of OfficeInTheWoods.com, fan of #futureofwork. Practice sustainable gardening and lifestyle. Runs a forest kindergarten near Sofia. Father of two.
Sign up for our newsletter and never miss an article
[mc4wp_form id=8036]
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.