Java technology stack for Silicon Valley web apps.
How the story begins? In brief – we were facing a Java web application with 5000 active users which were unavailable most of the time, most probably because of multiple memory leak issues. A script was running that restarts the Tomcat server when it becomes unavailable. The functionality was broken – users could not login and use properly the rest of the functionality. More about the story here.
Technology stack and tools
After the inventory phase of the project we found quite impressive list:
- Java 8
- Spring (mvc, boot). Used for DI. Spring MVC is exposing REST services to Angular.JS and Spring Boot is used to setup the application rapidly
- Elasticsearch – used for storing some searchable data.
- Stripe – Online payment provider with great API. You can easily switch in test mode and you can make json test queries through curl.
- Opengraph – The Facebook protocol (set of meta tags on a html page) that enables any web page to become a rich object in a social graph
- Twitter API
- Redis – Clusterizable key-value cache and store
- Bootstrap – CSS for mobile-first front-end web development
- Wro – Web Resource Optimizer for Java web applications
- Thymeleaf – Java XML/XHTML/HTML5 template engine
System, OS, Infrastructure
- Rackspace – Cloud servers where the application is actually deployed
- Cloudflare – Brings security and performance optimization to the web application
- Linux Centos
- Apache Tomcat
- Vagrant – Virtualization of server environment for developers
- IntelliJ IDEA
QA and monitoring
- Uptimerobot – monitoring the service availability
- Logstash – elasticsearch based server to store log messages.
- Kibana – interface for Logstash/Elasticsearch. Easily search and filter logs.
- Jamon – Monitor for Java performance
User engagement, tracking, support
- Optimizely – A/B testing framework for web and mobile apps
- Uservoice – feedback and support system
- Twitter – Used for providing support
- Mixpanel – Web analytics. Every startup ‘must have’ tool
- Mandrill – Managing mail/news lists, sending mails
- Meldium – Online password manager. Easy to share credentials with the team
- Bitbucket – Git repository and basic PM functionality
- Google hangouts – for everyday communication
When we start looking for the problems we found the following system architecture. We had:
- 3 servers with running Apache Tomcat. One of them contains the web application and the other 2 were used in cluster for the Spring Quarts, which actually load the system.
- 4 servers for the Elasticsearch cluster
- 1 server for Logstash
- 1 server for Redis
The most important issue to fix was that users were unable to login. After some debugging, digging into the code and the systems we found the problem. It was the Elasticsearch itself.
Elasticsearch and ACID support
Elasticsearch was used as a database. It does not support ACID transactions. This means that when you put something (like Twitter authentication token) there is no guarantee that it is available for reading the next moment. What happens is that the authentication system tries to read the login token which is saved through a callback using another node. The token is not replicated in all the nodes, the authentication mechanism cannot find it and the user cannot login.
Most of the memory issues were related to the Spring message queue. It was trying to save data in the Elasticsearch, or calling some external services (geoip, etc). Those operations became very slow with increasing the system load. The queue was filled with messages which could not be dispatched and it started to take too much memory. Restarting the server cleans the queue but the data is lost.
We decided to store the data in MySQL server and leave only the statistic data into single node Elasticsearch server. One more reason to make this decision was that all the persistence model and ORM was implemented in the code which was harder to maintain than the JPA/Hibernate model.
The migration was not an easy process since we had to reimplement all the queries.
After the migration we have 1 server running Tomcat, Elasticsearch and Redis and 1 more for Logstash. With the same user’s load we have now 30% resource usage and 75% less servers which reduced the monthly expenses by 50%.
- Consider the technology stack. Using popular tools and frameworks is cool but developers have to be careful when putting them working together. Elasticsearch is a great tool, but it is designed for search not to serve as a database. We have some problems with the integration of Angular.JS and WRO on the front end side.
- The rule of 3. At least 3 engineers should make an estimation of a task. We did make a mistake thinking that the migration from Elasticsearch to MySQL would be an easy process. If we had spent more time on estimating this could not be the case.
- Set up a process. Thinking that it is a short task we couldn’t manage to setup a proper process. This led to unknown delivery terms, missed deadlines, communication overload and more.
- Implement stress test with Gatling so we’ll be able to predict when the system will crash under the user’s’ load.
- Spring cache and Redis for session management. Store the session objects in Redis so we can add, remove tomcat nodes on demand.
Do you have you experience with the technologies listed above? Have you ever used Elasticsearch as a database?