Event Processing Thinking: enrichment

Showing posts with label enrichment. Show all posts

Sunday, February 19, 2012

On enrichment - and the difference between BRMS and EP

A recent article in the IBM developerWorks discusses two ways to enrich data used for rules from external databases, one of them is doing the enrichment in the request level, before calling the "decision server" (which is the current name for using BRMS system using the request-response protocol), the other one is doing enrichment during the rule processing itself. The article describes how each of these options is done and also discusses pros and cons, the benefits of enrichment by the request level are - less complexity, and better performance of the rule component; the benefits off enrichment at the rule level are - handling dynamic data and more specialization for the exact data that is being used by the rule.

Thinking about event processing -- there is similarity to the BRMS case, event can be enriched both by the event producer and as part of the event processing itself, the arguments are not far from those in the BRMS case, there is one fundamental difference in event processing -- the work is not done using the "request-response" protocol, moreover, the different part of the system are decoupled, thus the event producer does not necessarily know what purposes the event is going to be used, thus there may be different types of enrichment needed for different uses. The dynamic aspect is applicable here, and there may be some race conditions in highly dynamic systems between updates in the database that was enriched and its use in enrichment, unless the event processing enrichment system locks the data in the database until it is being used in the event processing system, which requires the event processing system to exhibit a transactional behavior for part of it, but I'll not get now into this issue,

Bottom line: The considerations in event driven architecture are somewhat different than the request-response systems that are the most common one in computing.

Saturday, March 13, 2010

On events versus data

The word "data" always reminds me of the android from Star Trek The Next Generation whose name was data. The word data (in computing) typically is very general and refers to anything the is represented on digital media, the picture of data above is also a piece of data, like many other things. The word "event" also has a broad term which means something that happened.

Recently Paul Vincent wondered in his Blog about the difference between event and data, as some people think that events are footnotes to data. Since by the definitions above, obviously event and data are not really the same, I'll try to talk about the touch points among them, since those are the reason of misconceptions.

There are various touch points between events and data:

Event representation contains data. Event is represented in the computing domain by "event object" or "event message" which usually is also is called "event" as a short name. This event representation includes some information about -- what is the event type, where it happened, when it happened, what happened, who were the players etc... Example: the event is "enter to the building", the event's payload contains information that answer questions such as: what building? who entered? when ? and maybe more. The payload of the event is data, it may be stored (see event store), or just pass by the system.
Data store can store historical events. Event representations can be accumulated and stored in a data store, for further usage. There are large data stores that collect weather events. Note that in order to navigate in historical events, these events may be stored in a temporal database an area that I've dealt with in the past, sometimes if the events are spatial then it have to be stored in spatiotemporal database.
Database can be event producer. In active databases the event were database operations; insert, modify, delete and retrieve, in this case the fact that some data-element has been updated or accessed is the "something that happens" (which may or may not reflect something that happens in reality), and the database acts as event producer and emits event for processing by an event processing network. Note that actually all event producer contains some data that is turned into event, for example transaction instrumentation like what IBM has done in CICS as event producer.
Derived events as database updates. An event processing application take events from somewhere as input, does something, and creates derived events, and send them somewhere, this is all event processing is in one sentence, a derived event created in this process may go to an event consumer, the event consumer may be a DBMS or another type of consumer whose action is to update some data store.
Event enrichment by data during the event processing. During the event processing operations, sometimes enrichments of events is requested, let's return to the event of a person enters a building, the event processing application deals with security access control, and needs to know what is the person security clearance, this information is not provided with the event which provides only identification of the person, and there need to be some enrichment process in which an enrichment event processing agent accesses some global store, in this case reference data, to extract the clearance value and put it inside the event for further processing.

Thus the main issue is not the "versus" issue but the various relationships between the two terms.

Tuesday, April 15, 2008

On Event Processing Agents

There are different type of agents -- double agents, as seen above which is a series of sweets, insuracne agents, travel agents, and some computerized agents - in my past I have dealt with mobile agents, and there are the intelligent agents in AI, and our own event processing agents (EPA).

David Luckham has written an amusing piece that follows Paul Vincent's "CEP and Agents" in TIBCO's Blog. The amusing part is that Paul has written about AI agents, which uses somewhat different terminology then the event processing terminology, and putting it in a "CEP Blog" is somewhat confusing. I am a "product" of the databases community, and have done some work that was on the AI border in the past, alas, the AI folks are using different terminology to talk about the same thing, and I thought at that time that they are doing it on purpose to confuse me. So, while there are many types of agents, I'll concentrate on the concept of "event processing agent" that has been coined by David Luckham. I like this term and adopted it in the following way: EPA (Event Processing Agents) is a software artifact that receives an event cloud or stream or collection of events or a single event (depends on the agent type and capabilities), does some computation on these event, and produce one or more events as an output. That's it. EPA is also a node in the EPN (Event Processing Network). There are different types of EPAs :

"simple event processing" EPAs - filter and routing,
"mediated event processing" EPAs - enrichment, transformation, validation
"Complex event processing" EPAs - pattern detection
"intelligent event processing" EPAs - prediction, decisions...

The common denominator: each of them receives events as input, emits events as output and does a single type of function.

I find this type of abstraction both very easy to explain people how EP systems work, and also basis for architecture. The EPN routing can be done by standard middleware, or in a stand-alone mode. Other terminology issues raised by David Luckham is the relationships to the "actor model" and to "engines".

The actor model is a model that helps reasoning about concurrency, while agents in AI are autonomous goal-driven artifacts. These are orthogonal terms, of course. In the context of EPA - when looking at EPAs as an executable network, we can look at each EPA as an actor and apply actor models.

Last but not least -- relationships of EPAs to engines -- an EPA is a software artifcat, it can be an instance of an engine, it can be some software that contains an engine, and it can be hard-coded program, as long as it complies with the EPA definition. In a future world, with inter-operability (and perhaps also language) standards, we'll be able to run (and maybe to self-select) multiple engines for the same EPN, residing in different EPAs.

More about EPA types -- later.

Saturday, February 2, 2008

On Immutability of events

Israel is the land of milk and honey, but not the land of snow - snow is quite rare in Israel. This week the rare event of snow occurred in high mountains across the country - in the picture snow near one of the gates of the old city of Jerusalem. In the Carmel mountain ridge, where I live there was a little bit of snow, higher in the ridge, and some people went there and brought to the office bags full of snow... As I lived several years in the Philadelphia area in USA, I am personally less excited from snow - but it is a notable event here.

Today's topic relates to a question that Marco, fellow EP blogger, and the person behind rulecore, has asked about the previous posting, his question related to enrichment, but I'll extend it to the rest of the transformations : "event is something that is happened, and as such it is immutable, cannot be changed; Do transformations and enrichments break this model?".

The answer for this question is - no. Transformations do not break the model, and events are still immutable. According to the pure model - events cannot be altered or deleted, and when represented in an event store, it has to be an "append only" type of store.

Enrichment and transformations are, in fact, creation of new derived events, as a function of raw events. Thus, according to the pure model, transformed or enriched event is not the same event:

It has a different type of event - with different structure.
It has a different event-id.

An Enrichment example maybe - the event is: order, and it has an attribute that refers to customer. The enrichment function looks at the customer in some database, and fetches the values of : customer type (platinum, gold, silver, nobody) and customer_credit_limit.

The fact that there is a different event-id for the raw event and transformed event allows also the traceability to trace the transformation (maybe the problem is in wrong identification of the customer?).

There are some considerations that may, in practice, push towards not obeying to the pure model, and I'll talk about them some other time.

Event Processing Thinking