Streams of data
Just as
Microsoft SQL Server was designed to allow developers to manage static data,
StreamInsight was designed to work with streams of data. But what does a stream
of data me
an? Well a stream of data is a sequence of pieces of information, for
each such piece of information a certain time is associated with it. Usually
the associated time is the date-time of creation.
Such
streams of data can be produced by countless devices which vary from smoke
sensors, temperature sensors to smartphones, robots, web applications, hosting
servers or trading applications.
Event
An
event can be defined as a basic unit of data processed by the StreamInsight
server, each event encapsulates a piece of information thus we can say that a
stream contains a sequence of events. Each event consists of two parts, the
header and the payload.
The
header defines the event kind and temporal properties of the event. All the
temporal properties are application-based and supplied by the data source
rather than a system time supplied by the StreamInsight server. All the
timestamps use the .Net DateTimeOffset data type, also StreamInsight normalizes
all times to UTC date-time automatically.
The
payload is a .NET data structure which holds the data associated with the
event, the fields of the data structure can be defined by the developer. Each
field can have a .NET data type e.g. int, float, string etc.
Query
The
same way as a fisherman uses a fishing net to catch fish from a river, we can
use a StreamInsight query to retrieve relevant information from a stream of
data. The results of the query are received incrementally for as long as we
need.
One can
define a myriad of queries starting from simple ones like selecting all events
which fulfill a certain condition, to more complex like selecting certain
events which appear in a window of 3 minutes.
The
main difference between StreamInsight and a database is that StreamInsight never
stores the data, a query is kept active all time while the server is running.
Every time a new event appears it triggers a new computation and generates a
new result. Of course if we are interested we can store the results of the
queries.
Source
Devices which are data producers become sources of data for the StreamInsight application. One can define multiple sources of data which go into the StreamInsight server and against which queries are executed.
Sink
We have sources of data which become streams of data and queries which are executed against them, but how can we get our hands on the query results? Well we can define a sink in which StreamInsight will send the result of the defined queries. Also in this case we can define multiple custom sinks, some might represent a conventional database or an user interface in which users can acknowledge the information immediately.
StreamInsight components working together
As seen
in the architecture diagram from above, one can define source of events like
smartphones, fire sensors, smoke sensors, temperature sensors, server logs or
event historical data. This platform allows developers to aggregate all these
events by defining LINQ queries, the results of these queries are then passed
to any developer defined sink like monitoring devices, monitoring applications
or even data warehouses.