Saturday, January 30, 2016

StreamInsight components

Streams of data


Just as Microsoft SQL Server was designed to allow developers to manage static data, StreamInsight was designed to work with streams of data. But what does a stream of data me

an? Well a stream of data is a sequence of pieces of information, for each such piece of information a certain time is associated with it. Usually the associated time is the date-time of creation.

Such streams of data can be produced by countless devices which vary from smoke sensors, temperature sensors to smartphones, robots, web applications, hosting servers or trading applications. 

Event


An event can be defined as a basic unit of data processed by the StreamInsight server, each event encapsulates a piece of information thus we can say that a stream contains a sequence of events. Each event consists of two parts, the header and the payload.

The header defines the event kind and temporal properties of the event. All the temporal properties are application-based and supplied by the data source rather than a system time supplied by the StreamInsight server. All the timestamps use the .Net DateTimeOffset data type, also StreamInsight normalizes all times to UTC date-time automatically.

The payload is a .NET data structure which holds the data associated with the event, the fields of the data structure can be defined by the developer. Each field can have a .NET data type e.g. int, float, string etc.

Query


The same way as a fisherman uses a fishing net to catch fish from a river, we can use a StreamInsight query to retrieve relevant information from a stream of data. The results of the query are received incrementally for as long as we need.

One can define a myriad of queries starting from simple ones like selecting all events which fulfill a certain condition, to more complex like selecting certain events which appear in a window of 3 minutes.

The main difference between StreamInsight and a database is that StreamInsight never stores the data, a query is kept active all time while the server is running. Every time a new event appears it triggers a new computation and generates a new result. Of course if we are interested we can store the results of the queries.

Source



Devices which are data producers become sources of data for the StreamInsight application. One can define multiple sources of data which go into the StreamInsight server and against which queries are executed.

Sink


We have sources of data which become streams of data and queries which are executed against them, but how can we get our hands on the query results? Well we can define a sink in which StreamInsight will send the result of the defined queries. Also in this case we can define multiple custom sinks, some might represent a conventional database or an user interface in which users can acknowledge the information immediately.

StreamInsight components working together




As seen in the architecture diagram from above, one can define source of events like smartphones, fire sensors, smoke sensors, temperature sensors, server logs or event historical data. This platform allows developers to aggregate all these events by defining LINQ queries, the results of these queries are then passed to any developer defined sink like monitoring devices, monitoring applications or even data warehouses.

Saturday, January 2, 2016

StreamInsight

StreamInsight is a platform developed by Microsoft which allows developers to create and deploy complex event processing (CEP) applications. This platform is based on the existing .NET Microsoft platform and it enables developers to implement robust and highly efficient CEP applications. There are a lot of possible event sources, some of the most relevant are:

  • Financial trading applications
  • Web analytics
  • Manufacturing applications
  • Server monitoring applications

One can use this platform to easily create tools to monitor data from multiple sources for meaningful patterns, trends, exceptions and opportunities. Analyzing and correlation can be done incrementally while data is produced (in real time) without storing it first, which translates in to having a low latency application. As a source of events historical data can also be used.

Key Benefits


In the following I will try to talk about the most important features and advantages offered by this platform.

Highly optimized performance and data throughput


StreamInsight supports highly parallel execution of continuous queries over high-speed data because it implements a lightweight streaming architecture. The use of in-memory cache and result computation done incrementally provide an excellent performance with high data throughout and low latency. In StreamInsight all processing is automatically triggered by incoming events based on defined queries. Also the platform provides the functionality for handling out-of-order events and in addition static reference or historical data can be accessed and included in the low-latency analysis.

.NET development environment


Microsoft created the .NET development environment in which programming languages like C#, tools like Visual Studio and services like SQL Server can be easily integrated and used for applications development while still keeping the loose coupling between them. StreamInsight is included in this environment in which one can easily develop fast and robust applications. Developers can write their CEP applications using C#, leveraging the advanced language platform LINQ (Language Integrated Query) to create queries.

Given the fact that there is a large community of the developers already familiar with these Microsoft technologies the cost and time of the development of a CEP application is significantly reduced.

Flexible deployment capability


StreamInsight platform provides two ways of deployment scenarios. First is a fully integrated into the developed application as a hosted (embedded) DLL. The second way is deploying StreamInsight as a stand-alone server with multiple applications and users sharing the server. This means that one can develop multiple, independent, applications which use the same StreamInsight instance. The CEP server runs in a wrapper such as an executable or the server could be packaged as a Windows Service.

Extensibility


StreamInsight allows developers to extend its functionality by giving them the possibility to define their own operators, functions and aggregates to be used in queries and define specific event types against which to run the defined queries.

One of the great things about StreamInsight is that it was designed to seamlessly integrate with any domain specific business logic. This means that the platform does not come with any implemented functionality for specific business sectors but it allows developers to plugin any specific business logic.

CEP Query Visualization and Analysis


Microsoft StreamInsight provides a stand-alone Event Flow Debugger which is a powerful GUI tool that enables visual inspection of a continuous query. One can use this graphical tool to quickly inspect the query tree, replay data processing and perform analysis.


Latest version


Currently the latest version of StreamInsight is 2.3, this was released together with SQL Server 2014 on the first of April 2014. Release 2.3 contains only a licensing update, so any code written against the previous version, 2.1, will still work.


In the future posts about StreamInsight I will present some of its most important components.

Robert Rusu