Learn from Java Champion Gunnar Morling: Performance unit testing with Java Flight Recorder (JFR) and JfrUnit

In our new Java Monthly edition, we’d like to introduce you to Gunnar Morling. He was kind enough to share his experience on 13 Java-related questions. Gunnar Morling is a software engineer and open-source enthusiast by heart. He is leading the Debezium project, a distributed platform for change data capture. He is a Java Champion, […]

by Dreamix Team

September 30, 2021

10 min read

Java Special Daily ben Evans 2 - Learn from Java Champion Gunnar Morling: Performance unit testing with Java Flight Recorder (JFR) and JfrUnit

In our new Java Monthly edition, we’d like to introduce you to Gunnar Morling. He was kind enough to share his experience on 13 Java-related questions.

Gunnar Morling is a software engineer and open-source enthusiast by heart. He is leading the Debezium project, a distributed platform for change data capture. He is a Java Champion, the spec lead for Bean Validation 2.0 (JSR 380) and has founded multiple open source projects such as JfrUnit, kcctl, Layrry, and MapStruct. Gunnar is an avid blogger and has spoken at a wide range of conferences like QCon, Java One, Devoxx, JavaZone, and many others. He’s based in Hamburg, Germany. He was kind enough to share his experience with us.

Java Monthly: How important is testing of non-functional requirements throughout the development workflow of an enterprise project? What is the cost of not doing so?

Gunnar Morling: Testing non-functional requirements (NFRs) is of greatest importance for delivering applications that are performant, reliable, and secure. Think for instance of the famous quote from Amazon, stating that 100ms of additional latency costs them 1% of the profit. Or consider security: losing sensitive user data due to insufficiently secured applications is not only bad for your users, but also for your own reputation and business. Luckily, more and more developers realise that it’s not only critical to consider these aspects when designing and building an application, but that it is also necessary to identify and prevent regressions related to NFRs early on.

Java Monthly: With the JfrUnit library, you have developed an approach for testing non-functional requirements using JDK Flight Recorder via so-called “proxy metrics”; Аre there other important proxy metrics apart from garbage collection, memory allocation and database I/O that can be obtained by JDK Flight Recorder and be asserted via JfrUnit?

Gunnar Morling: JDK Flight Recorder (JFR) allows to gain deep insights into all kinds of performance-related metrics of the JVM and applications running on top of it; besides the things you’ve mentioned, for instance, information about classloading, locking behaviour, file and socket I/O, JIT compilation, and much more. JDK 17 comes with 167 built-in event types, and you can create your own ones too, specific to your domain and carrying domain-specific information. Plus, JFR also is a method profiler.

Now when it comes to assertions on proxy metrics via JfrUnit for the purpose of identifying potential performance regressions in an application, the event types you mentioned, for tracking GC, allocations, and I/O are a great starting point.

Java Monthly: In one of your blog posts about Flight Recorder you mentioned that it is a good practice to keep the creation of more “technical” events separate from the business logic. Can you elaborate further on this?

Gunnar Morling: From a maintainability perspective, it is generally a good idea to separate purely technical aspects – like emitting JFR events – from your business logic. It makes it easier to focus on the different parts and implement the cross-cutting concerns in one central place (such as a method interceptor), without adding lots of “noise” to the actual business code. It also makes it easier to achieve consistency, for instance when it comes to aspects like the format and structure of emitted events.

Java Monthly: Can you suggest some scenarios where it is desirable to configure detailed behaviour for a given event type?

Gunnar Morling: There’s always a trade-off between the expressiveness of an event (what kind of information does it provide at which level of detail?) and the space requirements for persisting it. While JFR uses a very efficient binary file format, the size of overly verbose events emitted at a high frequency can quickly add up, potentially even causing events to be dropped. So if for example, you are using custom JFR events to log information about REST API invocations, you might want to only store some key metadata like timestamps, request types, or headers by default. But then, in specific situations, you could reconfigure that event type to also emit the full payload of the invocation.

Another interesting example is the jdk.ObjectAllocationSample event type, added in Java 16, superseding the earlier event types for object allocations inside and outside of TLABs (thread-local allocation buffers). It lets you configure the sampling interval, providing you with fine-grained control over the data volume produced when tracking object allocations.

Java Monthly: Some events such as one representing a REST API invocation might not contain a stacktrace. However, are there any event types where it is a good idea to set the stacktrace annotation to true when asserting JFR events with JfrUnit?

Gunnar Morling: You probably would want to have the stacktrace for most event types; naturally, for method profiling samples, but also for event types representing I/O or allocations. Which part of a codebase triggered these events is a vital piece of information for identifying allocation-heavy locations for instance. Whether you’d enable stacktraces for JFR-based assertions via JfrUnit depends a bit on your requirements. For example, you’d need the stacktrace if you wanted to identify only the subset of events triggered by a specific method of the application under test.

Java Montlhy: Let us say we find out there is a memory leak in our system. Is it possible to track the specific objects that could not get garbage collected with the help of Flight Recorder and JDK Mission Control?

Gunnar Morling: ndeed this is possible. JDK Mission Control does not only allow you to trigger and analyse heap dumps, but you also can use the jdk.OldObjectSample event type for identifying and examining (sampled) objects which couldn’t get garbage collected over a longer period of time. This can be an interesting alternative, if creating a full heap dump isn’t practical, e.g. due to size constraints.

Java Montlhy: Since neither Postgres JDBC driver, nor Hibernate emit JFR events, can you shed some light on ways of obtaining events like this?

Gunnar Morling: This is where JMC Agent shines; JMC Agent is a sub-project of JDK Mission Control, providing a configurable Java agent which can be used to inject code for emitting JFR events into existing Java libraries. By means of an XML descriptor, you instruct JMC Agent which classes and methods it should instrument, and how the emitted events should be structured. Optionally, you can capture fields, or parameter and return values of invoked methods and add them as attributes to the produced events. That way, you can emit events from existing 3rd party libraries which don’t have awareness of JFR at all. In the JfrUnit examples repository there’s one demo which shows how to use this approach for producing an event for each SQL query triggered by Hibernate ORM and using these events to identify performance problems, like the notorious N+1 SELECT issue. 

Java Monthly: In a blog of yours about JfrUnit you mentioned that test results are independent from wall clock time. Can you explain the reason for this?

Gunnar Morling: The problem with any kind of assertions based on metrics like application throughput or request latency is that these are heavily dependent on the execution environment. Numbers will typically differ a lot depending on where tests are run, e.g. your development laptop, your beefy production hardware, or a container with tight resource limits. Or when thinking of tests running within a CI environment, other concurrently running jobs may use lots of CPU capacity and thus impact your tests.

That’s why JfrUnit relies on the notion of “proxy metrics”, like object allocations or I/O between the application and the database; for a given use case, these metrics shouldn’t change between execution environments. A functionality like “Create User” should do a rather specific amount of allocations or I/O, no matter on which kind of machine it runs, or how much concurrent load is present there. If the application suddenly starts to do much more I/O or allocations than it used to do, this may be an indicator that there is a regression in the metrics we’re actually after, like throughput or latency. So the idea is to determine a baseline for these proxy metrics and then define assertions based on them. This is done via plain JUnit tests, i.e. developers can run these tests for instance in their IDE, getting feedback about any assertion failures very early in the development cycle.

Each test also produces and persists a JFR recording file, which you can load into JDK Mission Control, so to analyse and fix any potential regressions indicated by a failing JfrUnit test. There’s a demo of the general approach in this talk.

One important point is that an assertion failure here indicates a potential performance regression. If for instance more objects are allocated for a given use case, it may also be justified, because more complex business logic needs to be executed as per updated requirements. That’s why a proper understanding of the assertions, their semantics and implications of any failures is required.

Also, not all kinds of performance problems can be identified that way; for instance, there may be cases where an increased number of threads waiting on locks will only show up under actual high concurrent load. Furthermore, the user-perceived performance isn’t solely determined by the application alone; for instance, a changed execution plan of a database query will only show up when testing with realistic database size. In other words, JfrUnit can be a very valuable addition to the performance testing toolbox, but it should be considered a complement to other tools, not a replacement necessarily.

Java Monthly: You mentioned that a future area of improvement for JfrUnit will be analysis of historical event data from multiple test runs. Can you expand a little bit more on this topic? Is it going to affect the performance of the extension? 

Gunnar Morling: Ideas around this are still pretty rough at this point. One possible approach could be to have JfrUnit tests log specific data points of a test, like “this test did XYZ KB database I/O”. These data points would be loaded into a time series database or a benchmark repository system like Horreum. This would allow you to understand and analyse trends that build up slowly over time. If for instance, your test does one KB more I/O with every test run, you might miss this signal. But after half a year this may represent an actually impactful regression. Being able to look into such historical data and to draw these conclusions would be a very useful addition to JfrUnit.

This also would be interesting for cases where your system actually improves its behaviour (e.g. it does less I/O because of some code change) but you’re not aware of this and don’t adjust the corresponding assertions accordingly. You’d then miss a future regression from that new, better state, as long as you stay below the originally defined threshold. Having the result values from historical test runs in some sort of queryable data store would help you to prevent this situation.

As said, that’s all rather early thinking and I’d be very happy about any feedback on this.

Java Monthly: Is it possible to integrate JfrUnit in Gradle projects today and if not, will it be considered in the future?

Gunnar Morling: Being solely based on JUnit, JfrUnit can be used with any kind of Java build tool, be it Apache Maven, Gradle, or something else.

Java Monthly: Since the JfrUnit project requires OpenJDK 16 or later, are there any plans for support for JDK 11 or even JDK 8?

Gunnar Morling: With Java 17 as a new LTS having been released earlier this month, there shouldn’t be a need for that any longer, right 😉

More seriously, JfrUnit currently relies on the notion of JFR Event Streaming, which was added to Java via JEP 349 in version 14. It shouldn’t be too difficult to provide an implementation that loads and examines recording files as supported by earlier JFR versions. This should work with JDK 11 and even 8, since JFR was backported to that version, too. If someone is looking for ways to contribute to JfrUnit, this would be a great first task to pick up.

Java Monthly: Recommend a good source to follow for updates for Java related topics.

Gunnar Morling: Oh, where to start, there’s so many useful resources. Here’s a few ones I find myself coming back to regularly:

Java Monthly: Can you recommend a good book (can be both technical and non-technical)?

Gunnar Morling: Working on Debezium – an open-source platform for change data capture for different databases – at my day job, data engineering is close to my heart. There’s many great books in that area, but two I’d recommend in particular would be Designing Data-Intensive Applications by Martin Kleppmann and 97 Things Every Data Engineer Should Know by Tobias Macey (I happen to have contributed one section to that one :).

Is there anything else you would like to ask Gunnar Morling? What is your opinion on the questions asked? Who would you like to see featured next? Let’s give back to the Java community together!

Innovators by heart. Developers by passion. We’re Dreamix Team - a group of trailblazing techies trying to make the world a better place through technology. We provide custom software development, keep you updated on market and industry trends, and have a great time doing it.