Saturday, January 30, 2016

StreamInsight components

Streams of data


Just as Microsoft SQL Server was designed to allow developers to manage static data, StreamInsight was designed to work with streams of data. But what does a stream of data me

an? Well a stream of data is a sequence of pieces of information, for each such piece of information a certain time is associated with it. Usually the associated time is the date-time of creation.

Such streams of data can be produced by countless devices which vary from smoke sensors, temperature sensors to smartphones, robots, web applications, hosting servers or trading applications. 

Event


An event can be defined as a basic unit of data processed by the StreamInsight server, each event encapsulates a piece of information thus we can say that a stream contains a sequence of events. Each event consists of two parts, the header and the payload.

The header defines the event kind and temporal properties of the event. All the temporal properties are application-based and supplied by the data source rather than a system time supplied by the StreamInsight server. All the timestamps use the .Net DateTimeOffset data type, also StreamInsight normalizes all times to UTC date-time automatically.

The payload is a .NET data structure which holds the data associated with the event, the fields of the data structure can be defined by the developer. Each field can have a .NET data type e.g. int, float, string etc.

Query


The same way as a fisherman uses a fishing net to catch fish from a river, we can use a StreamInsight query to retrieve relevant information from a stream of data. The results of the query are received incrementally for as long as we need.

One can define a myriad of queries starting from simple ones like selecting all events which fulfill a certain condition, to more complex like selecting certain events which appear in a window of 3 minutes.

The main difference between StreamInsight and a database is that StreamInsight never stores the data, a query is kept active all time while the server is running. Every time a new event appears it triggers a new computation and generates a new result. Of course if we are interested we can store the results of the queries.

Source



Devices which are data producers become sources of data for the StreamInsight application. One can define multiple sources of data which go into the StreamInsight server and against which queries are executed.

Sink


We have sources of data which become streams of data and queries which are executed against them, but how can we get our hands on the query results? Well we can define a sink in which StreamInsight will send the result of the defined queries. Also in this case we can define multiple custom sinks, some might represent a conventional database or an user interface in which users can acknowledge the information immediately.

StreamInsight components working together




As seen in the architecture diagram from above, one can define source of events like smartphones, fire sensors, smoke sensors, temperature sensors, server logs or event historical data. This platform allows developers to aggregate all these events by defining LINQ queries, the results of these queries are then passed to any developer defined sink like monitoring devices, monitoring applications or even data warehouses.