Learn from Java Champion Mary Grygleski: Event Streaming in A Nutshell

In our new Java Monthly edition, we’d like to introduce you to Mary Grygleski. She was kind enough to share her experience on 12 more Java-related questions. Mary is a Java Champion and a passionate Streaming Developer Advocate at DataStax. Previously she was with IBM as a Java and Open Source Developer Advocate. She transitioned […]

by Dreamix Team

December 19, 2022

8 min read

Copy of Java Special Daily Johan Vos 12 - Learn from Java Champion Mary Grygleski: Event Streaming in A Nutshell

In our new Java Monthly edition, we’d like to introduce you to Mary Grygleski. She was kind enough to share her experience on 12 more Java-related questions.

Mary is a Java Champion and a passionate Streaming Developer Advocate at DataStax. Previously she was with IBM as a Java and Open Source Developer Advocate. She transitioned from Unix/C to Java around 2000 and has never looked back.  She is an active tech community builder outside of her day job, and currently the President of the Chicago Java Users Group (CJUG).

The term “event” has become overloaded in today’s modern computing world.  Event streaming, event processing, event messaging, event sourcing, event storming, event-driven architecture, and so on. Each represents a different aspect of eventful computing. Just like anything, there are also challenges, but the benefits of the event-based approach should outweigh any obstacles in the long run, and will prove itself to be a viable and dynamic solution for today’s modern systems that are “hungry” for data.

Dreamix: Event Streaming – can you explain the concept like if we are 5 years old?

Mary Grygleski: Event Streaming describes the ongoing delivery of events. So you may now be asking what exactly are events. First of all, events are everywhere in the general sense, meaning that we live in a world that is largely driven by events, such as the birth of a baby, a concert by a rock star, the world cup soccer game, and so on. Let’s bring that within the context of computing. An event is an occurrence that carries with it data in space and time. Each event is immutable and mathematically it would be a point in time. An ongoing delivery of events is called event streaming.

Dreamix: What are the main differences between event-driven and message-driven approaches?

Mary Grygleski: Based on the definitions from LightBend, the company that makes Akka and also led the working group on defining the Reactive Manifesto, both event-driven and message-driven approaches are fundamental in enabling messaging between senders and receivers. The difference between them is that, in the case of event-driven, the message sender does not have to worry about the address(es) of the receivers, whereas for message-driven, the address of the recipient needs to be known for the sender to deliver its message to. Event-driven approach very often is used in a publish/subscribe pattern, in which the sender emits messages to a well-known location (also known as topic, or some cases, channel), where the messages will be delivered (usually by a broker) to the recipients that are listening to or have subscribed to that location. Message-driven approach is about delivering messages to a known recipient, and often a queue is used for storing incoming messages in case there is a load spike.

Dreamix: What are the biggest positives and negatives from a developer point of view for when using event streaming?

Mary Grygleski: Event streaming is not a solution for all things. The biggest positives of using event streaming is its speed in ingesting data in high volumes and in near real-time, which means that as soon as incoming data arrives, it can be processed right away. It is also highly scalable, and is flexible in allowing the construction of pipelines whereby data can flow through efficiently. Example use cases include fraud detection, order processing, IoT system, ML/Ops, and so on. On the other hand, because of the unstructured data that streaming systems handle, it is not always easy to perform debugging and monitoring on it. In addition, AI and ML systems have yet to fully utilize the event streaming approach due to the sheer volume of data that these system are dealing with everyday, and the data sets keep getting larger day by day, but this will become a reality in the near future.

Dreamix: Are there any specific security considerations that we must take that are specific for event driven systems?

Mary Grygleski: Yes. While there are lots of benefits with event-driven systems in our current modern computing era, the flexibility such as the loosely-coupled approach of these systems can be the source of weaknesses for potential attackers. Fortunately, security concerns can be mitigated by careful planning and proper managing of all access points to data, and by having a very thoughtful design and implementation plan of how the interactivity of components ought to be.

Dreamix: With Java evolving at a constantly increasing speed are there parts of the language that you feel fit exactly for supporting event driven systems?

Mary Grygleski: Yes, Project Loom! Project Loom is currently under development and recently made its preview feature debut in the September 2022 release of JDK 19. It has undergone many revisions since its inception in 2017. It is a very fascinating topic in my mind since it introduces the concept of virtual threads, which essentially opens up the “floodgate” of the JVM and allows us to create threads almost without limit and at a very small cost. As such, Project Loom is solving problems on the lower-level thread programming, so it will be up to the event-driven frameworks and libraries to leverage on it, but definitely they each are targeting concurrency issues, so when an event streaming platform such as Apache Pulsar switches to using virtual threads, the potential gain will be huge.

Dreamix: Are both monolithic and micro service applications suited for event streaming usage?

Mary Grygleski: Without having some example use cases for these 2 types of applications, I think the answer is “it depends”. Microservices applications are decoupled by nature, so most likely using the event streaming approach to handle data messaging would be appropriate. Monolithic, or legacy applications can also potentially make use of event streaming for certain use cases, such as when they need to integrate with and/or pass data between different systems, constructing and deploying streaming data pipelines to enable the flow of data are very viable solutions.

Dreamix: Are there common pitfalls that we should consider when using/integrating event streaming solutions?

Mary Grygleski: Just like anything else that is new to the market, and while event streaming solutions/frameworks/libraries are not completely new, this is still an area that is emerging and the vendors are also still trying to define this product space, we need to be careful with all of the solutions that are being presented in the marketplace, and need to understand each of their strengths and weaknesses, plus also understand the business and solution requirements, before selecting the right tools for the right job. For example, do we expect to operate in a cloud environment, is it private or public cloud, or hybrid cloud, or multi-cloud. Another question to consider is the size of the messages, and if the event streaming platform can handle message chunking. As such, event streaming involves distributed messages going over the wire, or data in motion, and we need to consider how data is being transferred and whether the streaming platform can take care of the lower-level concerns well or not.

Dreamix: Is the event streaming portion of our application something we should pay special attention to when we are planning our testing strategy?

Mary Grygleski: Absolutely. We need to test on unit, functional, configuration, integration, and system level, as well as performance testing, and all other aspects of testing just like any other non-streaming part of the application.

Dreamix: What are the biggest strengths/differences of Apache Pulsar when compared to the other distributed messaging systems (ex. Kafka)?

Mary Grygleski: Apache Pulsar is an event streaming and distributed messaging platform that is built with the cloud infrastructure in mind. I have described it as having been born with the “Cloud Native DNA”. It is very much an infrastructure-aware event streaming platform, which means that it has built-in awareness of the operating infrastructure, so that, as developers, we do not have to worry as much about managing some of the changes in the operating environments. This is a huge benefit for us, because we can spend more time on solving application-level problems, instead of taking care of time-consuming tasks such as rebalancing the topics and brokers when new nodes or servers have been added to the cluster. Apache Pulsar takes care of rebalancing the topics and brokers for us. In addition, it has a very powerful geo-replication feature and is capable of performing data center replications that allows for different modes like active-active, active-passive, and selective message replication, etc. Another big advantage is that it is designed to support multi-tenancy, so the data it manages are well organized as independently operating units within the Pulsar cluster. There are many other benefits that prove Apache Pulsar as the next generation of cloud-based event streaming platform.

Dreamix: You have a really extensive experience in presenting in front of dev communities across the globe. How did you get into this field and which is your favorite conference that every developer should try to make time to participate/watch in?

Mary Grygleski: Believe it or not, I used to not like public speaking at all.  In my early days as a software engineer, I recall telling myself that I would be happy to be left “alone” doing my fun programming work.  However, as I grow older – and wiser 🙂 – I started feeling that something was missing.  Then eventually when my daughter was in college, I had more time and decided to start attending some tech meetups.  One of the first few groups I’d joined was the Chicago Java Users Group (CJUG) and, being a Java and Open Source engineer, I felt immediately that I landed on the right community.  I was going to every single meetup that CJUG had at the time (sometimes there were 2 in a month), and eventually the organizers there invited me to become a volunteer, and they also encouraged me to start speaking.  I have to admit that I was very nervous at first, but the organization was – and still is – very welcoming, so I became more and more comfortable with public speaking.  Then IBM started to build its Java advocacy team and the opportunity presented itself to me, and the rest, they say, is history.  

As for my favorite conference, this is indeed a very tough question because I have more than one favorite, so I’ll mention a few: Devnexus, the Devoxx family of conferences, JavaZone, JFokus, and recently Build Stuff Vilnius, among many others.

Dreamix: How do you update yourself about the latest trends in Java?

Mary Grygleski: I follow a lot of the great speakers in the Java community, and some of the fellow Java Champions, reading their blog posts, and trying to keep up with the different OpenJDK publications/sites. One of the best resources, without a doubt, is Foojay.

Dreamix: Can you recommend a favorite book about programming? What about a favorite book in general?

Mary Grygleski: Favorite programming book: Java Concurrency in Practice by Brian Goetz
Favorite book in general: He Leadeth Me by Father Walter J. Ciszek, S.J.

Is there anything else you would like to ask Mary Grygleski? What is your opinion on the questions asked? Who would you like to see featured next? Let’s give back to the Java community together!

Innovators by heart. Developers by passion. We’re Dreamix Team - a group of trailblazing techies trying to make the world a better place through technology. We provide custom software development, keep you updated on market and industry trends, and have a great time doing it.