About the Lifecycle of Data.
Some time ago I wrote a requirements oriented post about controlling the lifecycle of persisted entities.
Today I want to step back a bit and start writing about a number of “temporal” issues and problems of business applications development. On my way of thinking about this stuff I want to discuss, develop and improve (one or more?) possible solutions for them. This rankles me since a longer time: it’s the kind of stuff that regularly nobody has time (sic!) to care about when a business application project starts and which (again: regularly) begins to hurt and cost (maybe lots of) time and money later on when the true business requirements are discovered. And regularly they are discovered piece by piece only.

Foto credits to monkeyc.net according Creative Commons BY-NC-SA 2.0
So, what do I mean with Lifecycle of Data? At some point in time our baby-data is born. Then, while growing-up and aging it keeps some properties while others will change with time. Until at some point later it will die and disappear from this world. When we compare this picture of life with the basic data manipulation mechanisms a typical database management system provides we find a similar pattern: at some point in time a new record is INSERTed into a database table. Then, when time goes by, some of it’s properties (columns) will be UPDATEd. Until at some point later somebody decides to DELETE it from the database and therefore remove all it’s tracks from this world.
Let’s investigate this simple fact of what’s actually happening here a little bit deeper by reconnecting the lifecycle of information about the world with the lifecycle of the data living in the database. Databases regularly don’t store arbitrary data, but try to reflect some aspects of the real world. As a sample, I suggest to think of a civil registry’s database storing the spouses, children and all their places of living for a certain countries citizens. Let’s e.g. focus on one of those persons – let’s call her Mrs. Sometimes – and her place of living: some (probably short) time after her birth this fact of birth will be reflected in the database by INSERTing information about herself and INSERTing her first place of living. Later on, Mrs Sometime’s place of living will potentially change several times. Maintaining this databases address table therefore means to try best to reflect the real status of this single aspect of the world: where do my fellow citizens actually live? If somebody moves we UPDATE, if somebody dies, we DELETE.
To sum it up until here:
The basic data manipulation operations provided by a typical relational database management system (INSERT, UPDATE, DELETE) reflect the natural lifecycle of the recorded aspects of the (ever changing) outside world. The database we maintain by applying those operations tries it’s best to reflect the current status of the outside world. But regularly, it will take some time until the information finds its way into the database. And sometimes this will never happen – and the data inside the database will for some reason be plain wrong, meaning: not correctly reflecting the true facts of the outside world.
I used some important terms in the last paragraph which will propably occupy me for a longer time: I said the database reflects the current status (because outdated information is UPDATEd or DELETEd). I also said there typically is a time gap between a status change in the outside world (e.g. somebody moves) and the status change in the database (here e.g. correctly reflecting this move). And I said some database data may be wrong data.
To finish this post I want to deduct from this some of the typical user questions our simple-sample-database cannot answer:
- Where did Mrs. Sometimes live five years ago?
- When did Mrs. Sometimes move the last time?
- From when on did the Database reflect this fact?
- What did we know about Mrs. Sometimes’s first place of living some weeks after her birth?
- And what do we know about it today? When was it discovered that the original information about Mrs Sometimes’s first place of living was manipulated by her father and therefore corrected later on?
- And where can we write down the fact that Mrs’ Sometimes just called us and told us that she will definitely move in two month and already wants to drop this information?
Unfortunately the necessity to ask such questions, such “use cases” are not uncommon at all. In fact they are strongly typical and appear in many flavours. But they are typically not thought of from the beginning of an average application development on. To be continued.



http://bit.ly/7jmWTm @MartinSchimak with interesting thoughts on the life cycle of data….
Interesting. DB schema migrations, data warehouses and the likes actually merely attack the symptoms (http://soup.robert42.com/post/14036438) while RDBMS may cause these issues in the first place (http://debasishg.blogspot.com/2009/11/nosql-movement-excited-with-coexistence.html).
[...] middle of taking care of my momentarily sick little girls, I will today continue the reflections about the lifecycle of data and go through a sample of temporal data to further illustrate what I am speaking about. [...]