Saturday, December 26, 2015

Introduction to event stream processing

Today most of the businesses are actively monitoring data streams and application messages in order to detect business events or situations and take time-critical actions. Even though plans are made for businesses, events are the real drivers of the enterprise today because they represent changes in the state of the business.

Unfortunately, as it happened in case of data management in pre-database days, every usage area of business events today tends to build its custom infrastructure to filter, process, aggregate and propagate events.

Building efficient, scalable systems for monitoring and processing events has been a major research interest in recent years. As new technologies rise and existing ones are expanding the sources of relevant events is growing exponentially. A lot of technologies have been proposed, including Data stream management, complex event processing and asynchronous messaging. 

One can observe that all these systems share a common processing model but differ in query language features. Besides, some applications might have different requirements related to the consistency of the data which might translate in tradeoffs between insensitivity to event arrival order and system performance. It is clear that some applications require that events are processed in the order in which they arrive or were created, while others are more concerned with high throughput. If exposed to the user and handled in the system, user can specify the consistency requirements per query and the system would adjust itself at runtime to guarantee consistency and manage system resources.

Event stream processing use case

As an example let us consider a financial services company that actively monitors financial markets, individual trader activity and monitors financial markets. Having a desktop application, a trader can track a moving average of the value of an investment portfolio. From the business perspective it is required that the average is updated continuously as stock updates arrive and trades are confirmed. A second application running on the trading floor would extract events from live news feeds and correlates these events with market indicators to infer market sentiment, impacting automated stock trading programs. 

The query would filter patterns of events, correlated across time and data values. In order to bring value to the business, this application needs to provide the information as soon as possible, late events might result in a retraction. Meanwhile a third application might be running in the compliance office monitors trader activity and customer accounts, to watch for law violations, bad intentioned actions or institution guidelines. This queries might run until the end of the trading day or even until it finished processing all the events from that day. These applications carry out similar computations but differ significantly in their workload, requirements for consistency guarantees and response time.

The example illustrates that most real-world enterprise applications are complex in functionality and might incorporate different technologies that need to be integrated and are required to achieve high accuracy and consistency. In following posts I will write about a solution which can be used for building applications from the area of event stream processing. The solution is a platform developed by Microsoft and it is named StreamInsight.