Posts Tagged ‘Lifecycle’

Bitemporal Data is everywhere – Part 1

8. December 2009
In this writing:
- Record Time
- Actual Time
- Bitemporality

While still in the middle of taking care of my momentarily sick little girls, I will today continue the reflections about the lifecycle of data and go through a sample of temporal data to further illustrate what I am speaking about. Bitemporality: this e.g. is how Martin Fowler (Patterns for things that change with time) describes the two time dimensions of Bitemporal Objects:

I think of the first dimension as actual time: the time something happened. The second dimension is record time, the time we knew about it. Whenever something happens, there are always these two times that come with it.


Foto credits to bobydimitrov according Creative Commons BY-SA 2.0

“Whenever something happens, there are always these two times that come with it.”: the information that something happened itself needs time until it reaches its recipients and maybe even more time until it is recorded in some database… and this of course does not just concern a lightning and the subsequent reception of the thunder within some miles, it does not just concern the recording of the lightnings in some weather forecasters database or the reader of a local newspaper covering this thunderstorm. It concerns every single profane fact of life.

Tracking Record Time

Let’s therefore focus on such a profane fact of life and investigate the relationship between a hotel room and it’s price. Let’s imagine a simplified Price Watcher Application tracking the Prices of the rooms of several hotels in a given city:

HotelRoomPrice
"Four Seasons"Single Standard€ 149.00
"Bedtime" LodgeSingle Standard€ 98.00
"In good hands"Single Standard€ 110.00

A better normalised real life database schema for such an application would of course look a bit different, but this should not bother us at the moment: with this simple table we can already answer the question “Where do I get the cheapest city hotel room at the moment?” sorting the table by price in descending order… but actually… can we answer this? To be slightly more precise – and we have to be precise for our topic – we can just answer the question “Where do we get to our current knowledge the cheapest city hotel room at the moment?”. After all, we don’t reflect reality here, we just reflect what we know about it…

If our Price Watcher App eventually becomes a relevant factor in the market, conflicts with hotel owners could occur more frequently (”Why do you update just my price cuttings so slowly?”). We therefore might want to add the information when exactly the specific price was recorded with us:

HotelRoomPriceRecorded at
"Four Seasons"Single Standard€ 149.0012/01/2009 18:03
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:54
"In good hands"Single Standard€ 110.0012/07/2009 08:11

But thinking about this twice, wouldn’t it be wise not just to remember when we updated the price the last time, but to remember many such updates in order to be able to discuss single cases and to proove that we honestly treat all hotels equally? Well, then we may not update the table anymore but instead insert a new row whenever a price change occurs. Let’s for this purpose focus on “Bedtime Lodge” from here on:

HotelRoomPriceRecorded at
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:17
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:54

Now, let’s back-pedal at this point a little bit by “denormalising” the information we already have here: we add another column, which serves to reflect until which point in time we considered the associated information to be valid: it is the moment when the price for one hotel changed and we therefore added a new row:

HotelRoomPriceRecorded atEffective until
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:54unlimited

We realize: the new “Effective until” information is “unlimited” for new rows. And it is equal to the “Recorded At” Column of the new Row, the moment we “update” it by inserting a new row. Now: whenever we ask for current prices, we will want to filter out all the rows with outdated effectivity. But actually we should generalize this filtering procedure a bit: whenever we ask this table for a consistent set of data given at the current or a historic date, we always filter out rows for which this date is not in between “Recorded at” and “Effective until”. In order to make this even easier, we could decide to replace the “unlimited” with a date far enough in the future in order not to cause any Y2K problematic during our lifetime:

HotelRoomPriceRecorded atEffective until
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:5431/12/9999 23:59

And by the way, watching this: wouldn’t it be nice to show to our users not just the current price, but also things like e.g. how the minimum room price for a given hotel developed in the past or which hotels frequently offer rooms for minimum prices? All the information we need for this already is in place! We would just have to think about the queries a little bit. ;-)

Tracking Actual Time

Now, question: what’s bitemporal about this data. Answer: so far, nothing. So far, we just cared about Record Time, meaning we just cared about the moment we learned about a price change, we did not care so much about when it actually happened. If we call this type of storing data “Record Temporal”, we can imagine that we can use a very similar mechanism to store “Actual Temporal” Data by changing the semantics of our two Time Columns and call them now Actual From and Actual Until:

HotelRoomPriceActual fromActual until
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:5431/12/9999 23:59

What changes when giving the columns the semantics of Actual Time compared to Record Time? Well, when it comes to Record Time, it’s always an easily retrievable System Time we deal with: the moment some data is inserted/updated, we change the “Effective until” of the old row and the “Recorded at” of the new row to the current System Time and are finished. When it comes to Actual Time, things are different. Typically this point in time can just be provided by the human (or external system) using the database application. He/she/it is the only one who can tell: wait, now we update the information in your database, right, but actually the change in the real world already happened earlier. Which is why we have to ask ourselves, whether this kind of semantics would add value to our Price Watcher App? Probably not. But: the moment we change perspective on the given sample we find that somebody else will be interested in both Times: the “Bedtime Lodge”.

Bitemporality

The “Bedtime Lodge” will also have to maintain a room and price database with similar contents but just for their own rooms. The goal is now not to “watch” prices, but actually to “define” and “maintain” them. Of course, we don’t need the “Hotel” column anymore, because we definitely know, it’s us:

RoomPriceRecorded atEffective until
Single Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
Single Standard€ 98.0011/28/2009 11:5431/12/9999 23:59

The requirement we face here is to be able to define and maintain Prices which come into effect somewhen in the future. My next post will therefore combine the Concepts of Record Time and Actual Time and complete the sample showing the full Bitemporality.

About the Lifecycle of Data.

3. November 2009

Some time ago I wrote a requirements oriented post about controlling the lifecycle of persisted entities.

Today I want to step back a bit and start writing about a number of “temporal” issues and problems of business applications development. On my way of thinking about this stuff I want to discuss, develop and improve (one or more?) possible solutions for them. This rankles me since a longer time: it’s the kind of stuff that regularly nobody has time (sic!) to care about when a business application project starts and which (again: regularly) begins to hurt and cost (maybe lots of) time and money later on when the true business requirements are discovered. And regularly they are discovered piece by piece only.


Foto credits to monkeyc.net according Creative Commons BY-NC-SA 2.0

So, what do I mean with Lifecycle of Data? At some point in time our baby-data is born. Then, while growing-up and aging it keeps some properties while others will change with time. Until at some point later it will die and disappear from this world. When we compare this picture of life with the basic data manipulation mechanisms a typical database management system provides we find a similar pattern: at some point in time a new record is INSERTed into a database table. Then, when time goes by, some of it’s properties (columns) will be UPDATEd. Until at some point later somebody decides to DELETE it from the database and therefore remove all it’s tracks from this world.

Let’s investigate this simple fact of what’s actually happening here a little bit deeper by reconnecting the lifecycle of information about the world with the lifecycle of the data living in the database. Databases regularly don’t store arbitrary data, but try to reflect some aspects of the real world. As a sample, I suggest to think of a civil registry’s database storing the spouses, children and all their places of living for a certain countries citizens. Let’s e.g. focus on one of those persons – let’s call her Mrs. Sometimes – and her place of living: some (probably short) time after her birth this fact of birth will be reflected in the database by INSERTing information about herself and INSERTing her first place of living. Later on, Mrs Sometime’s place of living will potentially change several times. Maintaining this databases address table therefore means to try best to reflect the real status of this single aspect of the world: where do my fellow citizens actually live? If somebody moves we UPDATE, if somebody dies, we DELETE.

To sum it up until here:

The basic data manipulation operations provided by a typical relational database management system (INSERT, UPDATE, DELETE) reflect the natural lifecycle of the recorded aspects of the (ever changing) outside world. The database we maintain by applying those operations tries it’s best to reflect the current status of the outside world. But regularly, it will take some time until the information finds its way into the database. And sometimes this will never happen – and the data inside the database will for some reason be plain wrong, meaning: not correctly reflecting the true facts of the outside world.

I used some important terms in the last paragraph which will propably occupy me for a longer time: I said the database reflects the current status (because outdated information is UPDATEd or DELETEd). I also said there typically is a time gap between a status change in the outside world (e.g. somebody moves) and the status change in the database (here e.g. correctly reflecting this move). And I said some database data may be wrong data.

To finish this post I want to deduct from this some of the typical user questions our simple-sample-database cannot answer:

  • Where did Mrs. Sometimes live five years ago?
  • When did Mrs. Sometimes move the last time?
  • From when on did the Database reflect this fact?
  • What did we know about Mrs. Sometimes’s first place of living some weeks after her birth?
  • And what do we know about it today? When was it discovered that the original information about Mrs Sometimes’s first place of living was manipulated by her father and therefore corrected later on?
  • And where can we write down the fact that Mrs’ Sometimes just called us and told us that she will definitely move in two month and already wants to drop this information?

Unfortunately the necessity to ask such questions, such “use cases” are not uncommon at all. In fact they are strongly typical and appear in many flavours. But they are typically not thought of from the beginning of an average application development on. To be continued.

Spring Bean Instantiation for JPA/Hibernate

21. August 2009

In my last project I ran into an “actually” quite obvious trouble I want to resolve properly once and for all times, because I expect it to be a regular requirement when working with something like JPA/Hibernate persistence layer and something like the Spring Framework Bean Instantiation mechanism.

When I persist some JPA mapped Entity into my database, I’d later on very much like to be able to recover it exactly in the state when it was persisted. This is what persistence is all about anyway, isn’t it? It is, but troubles arise when this persisted Entity originally was instantiated based on some Spring Prototype Bean Configuration. Typically, Parts of the Properties of such a prepopulated Beans are just for Configuration Purposes and to persist those Properties to the Database is either obsolete and causing a lot of redundant and performance burdening data (in case of e.g. simple configuration values or beans), but can also turn out to be difficult or impossible to achieve in case the injected Data (e.g. some Configuration carrying Singleton Bean) is just not made to be persistent.

However, later on, when recovering such an Entity from the Database it will have lost all it (non-persistent) configuration data. Of course: JPA/Hibernate constructs the Java Objects via the Default Constructor of the Class and populates it with the Property Values persisted in the Database. That’s it, basically. The solution for my issue should be easy and some instructions and/or even ready code should be obtainable via the Internet: I just have to “teach” Hibernate to construct the Java Objects derived as Database Entities not via the default Class Constructur but via the Spring Bean Instantiation. I “just” have to do some research about this… ;-)

Once having this in place, I expect such an approach to turn out to make very good sense for almost all persistent Entities, also for the simple cases, because it allows

  • To externalize the configuration of transient-only Property values with default (prepopulated) values becoming persistent later on and to have all this transparent in one place within the Spring Prototype Bean Configuration.
  • Domain specific Object Models to carry some (e.g. DAO aware) lookup or logic methods in case applicable and useful.

I’m eager to see it working soon!

(This post is part of my thoughts about A portfolio for Enterprise Web Applications)

Control over lifecycle of persisted Entities

21. August 2009

Connected to my thoughts about A portfolio for Enterprise Web Applications I am thinking about developing (= providing a well tested, bundled and documented software feature with minimal coding effort) a mechanism which is capable of the following

  • Log all persisted states of an Entity inside the Database – always including Information about Action Type, Time and User/Process
  • Allow also forthcoming states to be saved as such an “inactive” Entity state (meaning “to be activated later”).
  • Allow for Action Types the Database Operations (C)reate, (R)ead, (U)pdate, (D)elete *and* (A)ctivate (the right to activate a Database changing Operation) to be defined as Rights a User/Process may have on a Class of Entities.
  • Allow each such User Right on an Entity Class to be restricted by a (custom) Right Condition.
  • Provide Out of The Box a Right Condition allowing a User to (A)ctivate only if the Operation to be activated was not done by the activating User himself (”Second Set of Eyes” Feature)
  • Allow for Ability to activate not just forthcoming, but also every old “inactive” Entity State. (”Entity Rollback” Feature)

I want to do this based on Java while making use of JPA/Hibernate simply because these ideas are about leveraging preexisting project experience and knowledge. To be further investigated is the Usage of

  • Hibernate Interceptors for Logging Entity State change
  • Hibernate XML API also for Logging
  • (Hibernate) Bean Validator Framework for CRUD and Activation Rights and mentioned Custom Conditions