Posts Tagged ‘Temporal’

Bitemporal Data is everywhere – Part 1

8. December 2009
In this writing:
- Record Time
- Actual Time
- Bitemporality

While still in the middle of taking care of my momentarily sick little girls, I will today continue the reflections about the lifecycle of data and go through a sample of temporal data to further illustrate what I am speaking about. Bitemporality: this e.g. is how Martin Fowler (Patterns for things that change with time) describes the two time dimensions of Bitemporal Objects:

I think of the first dimension as actual time: the time something happened. The second dimension is record time, the time we knew about it. Whenever something happens, there are always these two times that come with it.


Foto credits to bobydimitrov according Creative Commons BY-SA 2.0

“Whenever something happens, there are always these two times that come with it.”: the information that something happened itself needs time until it reaches its recipients and maybe even more time until it is recorded in some database… and this of course does not just concern a lightning and the subsequent reception of the thunder within some miles, it does not just concern the recording of the lightnings in some weather forecasters database or the reader of a local newspaper covering this thunderstorm. It concerns every single profane fact of life.

Tracking Record Time

Let’s therefore focus on such a profane fact of life and investigate the relationship between a hotel room and it’s price. Let’s imagine a simplified Price Watcher Application tracking the Prices of the rooms of several hotels in a given city:

HotelRoomPrice
"Four Seasons"Single Standard€ 149.00
"Bedtime" LodgeSingle Standard€ 98.00
"In good hands"Single Standard€ 110.00

A better normalised real life database schema for such an application would of course look a bit different, but this should not bother us at the moment: with this simple table we can already answer the question “Where do I get the cheapest city hotel room at the moment?” sorting the table by price in descending order… but actually… can we answer this? To be slightly more precise – and we have to be precise for our topic – we can just answer the question “Where do we get to our current knowledge the cheapest city hotel room at the moment?”. After all, we don’t reflect reality here, we just reflect what we know about it…

If our Price Watcher App eventually becomes a relevant factor in the market, conflicts with hotel owners could occur more frequently (”Why do you update just my price cuttings so slowly?”). We therefore might want to add the information when exactly the specific price was recorded with us:

HotelRoomPriceRecorded at
"Four Seasons"Single Standard€ 149.0012/01/2009 18:03
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:54
"In good hands"Single Standard€ 110.0012/07/2009 08:11

But thinking about this twice, wouldn’t it be wise not just to remember when we updated the price the last time, but to remember many such updates in order to be able to discuss single cases and to proove that we honestly treat all hotels equally? Well, then we may not update the table anymore but instead insert a new row whenever a price change occurs. Let’s for this purpose focus on “Bedtime Lodge” from here on:

HotelRoomPriceRecorded at
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:17
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:54

Now, let’s back-pedal at this point a little bit by “denormalising” the information we already have here: we add another column, which serves to reflect until which point in time we considered the associated information to be valid: it is the moment when the price for one hotel changed and we therefore added a new row:

HotelRoomPriceRecorded atEffective until
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:54unlimited

We realize: the new “Effective until” information is “unlimited” for new rows. And it is equal to the “Recorded At” Column of the new Row, the moment we “update” it by inserting a new row. Now: whenever we ask for current prices, we will want to filter out all the rows with outdated effectivity. But actually we should generalize this filtering procedure a bit: whenever we ask this table for a consistent set of data given at the current or a historic date, we always filter out rows for which this date is not in between “Recorded at” and “Effective until”. In order to make this even easier, we could decide to replace the “unlimited” with a date far enough in the future in order not to cause any Y2K problematic during our lifetime:

HotelRoomPriceRecorded atEffective until
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:5431/12/9999 23:59

And by the way, watching this: wouldn’t it be nice to show to our users not just the current price, but also things like e.g. how the minimum room price for a given hotel developed in the past or which hotels frequently offer rooms for minimum prices? All the information we need for this already is in place! We would just have to think about the queries a little bit. ;-)

Tracking Actual Time

Now, question: what’s bitemporal about this data. Answer: so far, nothing. So far, we just cared about Record Time, meaning we just cared about the moment we learned about a price change, we did not care so much about when it actually happened. If we call this type of storing data “Record Temporal”, we can imagine that we can use a very similar mechanism to store “Actual Temporal” Data by changing the semantics of our two Time Columns and call them now Actual From and Actual Until:

HotelRoomPriceActual fromActual until
"Bedtime" LodgeSingle Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
"Bedtime" LodgeSingle Standard€ 98.0011/28/2009 11:5431/12/9999 23:59

What changes when giving the columns the semantics of Actual Time compared to Record Time? Well, when it comes to Record Time, it’s always an easily retrievable System Time we deal with: the moment some data is inserted/updated, we change the “Effective until” of the old row and the “Recorded at” of the new row to the current System Time and are finished. When it comes to Actual Time, things are different. Typically this point in time can just be provided by the human (or external system) using the database application. He/she/it is the only one who can tell: wait, now we update the information in your database, right, but actually the change in the real world already happened earlier. Which is why we have to ask ourselves, whether this kind of semantics would add value to our Price Watcher App? Probably not. But: the moment we change perspective on the given sample we find that somebody else will be interested in both Times: the “Bedtime Lodge”.

Bitemporality

The “Bedtime Lodge” will also have to maintain a room and price database with similar contents but just for their own rooms. The goal is now not to “watch” prices, but actually to “define” and “maintain” them. Of course, we don’t need the “Hotel” column anymore, because we definitely know, it’s us:

RoomPriceRecorded atEffective until
Single Standard€ 101.0010/31/2009 10:1711/28/2009 11:54
Single Standard€ 98.0011/28/2009 11:5431/12/9999 23:59

The requirement we face here is to be able to define and maintain Prices which come into effect somewhen in the future. My next post will therefore combine the Concepts of Record Time and Actual Time and complete the sample showing the full Bitemporality.

About the Lifecycle of Data.

3. November 2009

Some time ago I wrote a requirements oriented post about controlling the lifecycle of persisted entities.

Today I want to step back a bit and start writing about a number of “temporal” issues and problems of business applications development. On my way of thinking about this stuff I want to discuss, develop and improve (one or more?) possible solutions for them. This rankles me since a longer time: it’s the kind of stuff that regularly nobody has time (sic!) to care about when a business application project starts and which (again: regularly) begins to hurt and cost (maybe lots of) time and money later on when the true business requirements are discovered. And regularly they are discovered piece by piece only.


Foto credits to monkeyc.net according Creative Commons BY-NC-SA 2.0

So, what do I mean with Lifecycle of Data? At some point in time our baby-data is born. Then, while growing-up and aging it keeps some properties while others will change with time. Until at some point later it will die and disappear from this world. When we compare this picture of life with the basic data manipulation mechanisms a typical database management system provides we find a similar pattern: at some point in time a new record is INSERTed into a database table. Then, when time goes by, some of it’s properties (columns) will be UPDATEd. Until at some point later somebody decides to DELETE it from the database and therefore remove all it’s tracks from this world.

Let’s investigate this simple fact of what’s actually happening here a little bit deeper by reconnecting the lifecycle of information about the world with the lifecycle of the data living in the database. Databases regularly don’t store arbitrary data, but try to reflect some aspects of the real world. As a sample, I suggest to think of a civil registry’s database storing the spouses, children and all their places of living for a certain countries citizens. Let’s e.g. focus on one of those persons – let’s call her Mrs. Sometimes – and her place of living: some (probably short) time after her birth this fact of birth will be reflected in the database by INSERTing information about herself and INSERTing her first place of living. Later on, Mrs Sometime’s place of living will potentially change several times. Maintaining this databases address table therefore means to try best to reflect the real status of this single aspect of the world: where do my fellow citizens actually live? If somebody moves we UPDATE, if somebody dies, we DELETE.

To sum it up until here:

The basic data manipulation operations provided by a typical relational database management system (INSERT, UPDATE, DELETE) reflect the natural lifecycle of the recorded aspects of the (ever changing) outside world. The database we maintain by applying those operations tries it’s best to reflect the current status of the outside world. But regularly, it will take some time until the information finds its way into the database. And sometimes this will never happen – and the data inside the database will for some reason be plain wrong, meaning: not correctly reflecting the true facts of the outside world.

I used some important terms in the last paragraph which will propably occupy me for a longer time: I said the database reflects the current status (because outdated information is UPDATEd or DELETEd). I also said there typically is a time gap between a status change in the outside world (e.g. somebody moves) and the status change in the database (here e.g. correctly reflecting this move). And I said some database data may be wrong data.

To finish this post I want to deduct from this some of the typical user questions our simple-sample-database cannot answer:

  • Where did Mrs. Sometimes live five years ago?
  • When did Mrs. Sometimes move the last time?
  • From when on did the Database reflect this fact?
  • What did we know about Mrs. Sometimes’s first place of living some weeks after her birth?
  • And what do we know about it today? When was it discovered that the original information about Mrs Sometimes’s first place of living was manipulated by her father and therefore corrected later on?
  • And where can we write down the fact that Mrs’ Sometimes just called us and told us that she will definitely move in two month and already wants to drop this information?

Unfortunately the necessity to ask such questions, such “use cases” are not uncommon at all. In fact they are strongly typical and appear in many flavours. But they are typically not thought of from the beginning of an average application development on. To be continued.