temporal tables and data versioning - now and before db2...
TRANSCRIPT
The attendee will get a deep dive into all the DDL changes needed in order to
exploit DB2 V10 Temporal tables as well as the limitations. A case study
scenario using a live DB2 V10 system will be used to illustrate how the data is
versioned and how the application needs to understand what to request in order
to get the desired data. Finally we will look at which currently implemented
solutions can be modified to use Temporal tables instead.
1
The attendee will get a deep dive into all the DDL changes needed in order to
exploit DB2 V10 Temporal tables as well as the limitations. A case study
scenario using a live DB2 V10 system will be used to illustrate how the data is
versioned and how the application needs to understand what to request in order
to get the desired data. Finally we will look at which currently implemented
solutions can be modified to use Temporal tables instead.
Everything I say and write are solely my own expressions and do not reflect
the opinion of CA technologies.
The IBM documentation has been used very sporadic, so results and syntax
may vary from the documentation provided by IBM.
Since everyone working with DB2 (and other database management systems
for that matter) have had the need to implement what temporal tables provide
in DB2 10, let‟s have a quick look at some possible implementations how
customers have survived until now.
The majority of the presentation will be around DDL and DML changes by
illustrating a scenario.
Unlike many previous few features (like clone tables, NOT logged etc.) the
utility impact when dealing with temporal tables is very minimal.
This is a definition of temporal data from Wikipedia – interesting article and I
can recommend reading it to get deeper insight (I have only included a small
portion on this slide..
Some historical facts about temporal table – found on the same result list from
the previous slide. It is quite interesting that TEMPORAL DATA in the
database terminology is relative young.
I believe most of us have implemented solutions to handle temporal data for
many years, but we had to come up with special implementations and design
the data model to support this need.
The most straight forward solution was to have a base table with the live data
and then implement application logic to maintain a history table – and
hopefully the application code remembered to look at the various places for
the “correct” information.
DB2 V7 automated some of these tasks by allowing TRIGGERS. At least
some of the application logic could be eliminated but there was still the
opportunity for failure when the “correct” data was going to be retrieved.
Stored Procedures was another alternative to assist with maintaining/retrieving
“temporal data”.
Bottom line – no matter which solution, maintenance was an issue and just
made the concept more complicated.
In order not to have the application code maintain history, some sites used a
complete different method (and still do).
Depending on how accurate and up-to-date the history information must be,
this solution might not be valid since it requires access to the DB2 log.
Once a day (or how frequent needed) the log records for the tables where data
versioning was/is needed are extracted and loaded into a special history table
(basically the base appended with a timestamp column).
This method is much less error prone but the timing issue (how current the
history table must be) can be an issue and will have to be dealt with.
The nice part of this implementation is that the transaction doesn‟t get any
performance penalties inserting rows into the history, calling triggers etc.
DB2 for z/OS offers three types of temporal tables – all different in nature and
how they operate and how they are created.
SYSTEM TIME temporal tables: controlled solely by DB2 by keeping “old
images” in a separate history table every time a row is updated or deleted.
BUSINESS TIME temporal tables are controlled to a greater extent by the
application. Unlike SYSTEM TIME, this type has no history table associated –
if a row has different content based on time, this is controlled inside the same
table.
It is possible to mix both types – and you have BI-TEMPORAL tables.
The easiest way to see the differences is by using a scenario with a number of
events reflected in the TEMPORAL tables. This presentation only deals with
SYSTEM TIME and BUSINSESS TIME.
The story we will use in this presentation is „Steen‟s life and where he lived” –
illustrating how temporal tables can be used to record and store changes “out-
of-the-box”.
This matrix illustrates the different “transactions” we will incorporate into the
System Time temporal table scenario to illustrate the lifetime and DDL as well
as DML needed to implement this kind of solution.
What you have to pay attention to is – some incidents/transactions happens
AFTER the day the real “transaction” / incident / issue happened – we will
illustrate the difference between SYSTEM TIME and BUSINESS TIME later
in the presentation.
We start to create the BASE table STEEN_LIFE (top frame).
In order to exploit and implement SYSTEM TIME temporal table for this
table, it is necessary to add three timestamp columns (when the row is inserted,
end of row and transaction timestamp).
The last task is to describe to DB2 that the STEEN_LIFE table is ready to be
associated to a HISTORY table by adding the PERIOD_TIME by describing
which columns describe the “life of the row - here ROW_START and
ROW_END)
In order to complete the preparation, we need to create the HISTORY table to
hold any UPDATED/DELETED rows from the BASE table.
The easiest method is to create the history table using the LIKE clause.
The last task is to associate the BASE with the HISTORY – done using the
ALTER TABLE ADD VERSIONING.
Now it‟s time to illustrate the outlined scenario . . . What happened in Steen‟s
life – using Temporal tables
What I figured out was that you can‟t use DATE and regular TIMESTAMP
columns for the BASE table to enable SYSTEM TIME TEMPORAL – you
will have to use the DB2 10 timestamp extension.
I really did try to use DATE as the START-END ROW – but DB2 rejected me
without reading the doc – it probably makes sense since you could update
the same row multiple times per day/minute etc.
13
Unlike CLONE tables and other newer features in DB2 where tables /objects
are “mimic‟ed” , the base and history tablespace can be completely different.
This example illustrates the BASE table residing in a PBR UTS tablespace
while the HISTORY table can be PBG - which absolutely makes sense since
the history table can grow a lot depending on the activity on the base object.
14
First event to illustrate – Steen is born 4/14-1961 but the event isn‟t reported
by the birth clinic until 4/14-1961.
The FIRST insert only happens in the BASE table – let‟s move on . . . . .
The second event is that Steen is being Christened (and getting his official
name) – so the row is updated June 22nd 1961. The previous BASE row is
inserted into the HISTORY table as-is – except for the fact that ROW-END
column will adopt the transaction timestamp.
We now have the BASE row with current status and one row in the history
table reflecting the previous image.
The next even illustrates an address move – the base row is updated with the
new information and the history table gets the previous image (now holding
two historical images of this specific row).
Steen decides to move to the US after getting his Green Card. Even though he
did the move late 2002, he didn‟t report the move until February 22nd 2003.
This isn‟t good since he will have to pay taxes in both Denmark and the US for
the period. In order to avoid taxes in Denmark – what is needed is the ability to
have the START date in the base table accepting a date from the past (we will
get back to this issue later and especially when dealing with BUSINESS TIME
temporal tables).
As you can see – the history table keeps on growing so definitely something
you need to consider when designing the database.
The last even covered in this scenario is the end of Steen‟s life.
It is noteworthy to pay attention to the fact that when the row is deleted from
the base table, an INSERT takes place in the history table. You can say that
Steen will have eternal life due to the history table when using SYSTEM
TIME temporal tables.
Now that we have covered how INSERT/UPDATE/DELETE works in action
for a SYSTEM time temporal table – let‟s have a close look at what to be
aware of – especially when planning which temporal table type to use for the
application.
In order to retrieve data – you can use normal SQL to select from either table.
However - the power of TEMPORAL tables is associated with some SQL
extensions.
Using AS OF “timestamp” in conjunction with FOR SYSTEM_TIME, DB2
will find the appropriate row(s) for the requested time. This might mean you
get the BASE row or zero, one or more history rows. You can also use the
BETWEEN predicate following the FOR SYSTEM_TIME clause.
The documentation clearly states that the timestamp columns for SYSTEM
TIME temporal tables cannot be updated !!
The first example illustrates that the SQL can be executed – and interestingly
accepted by DB2 – but nothing is updated !!??!!??
The behavior is predictable for the BASE table – a similar UPDATE request is
not accepted by DB2 (remember that the additional columns are defined AS
ALWAYS). Trying to update the TEMPORAL timestamp columns results
correctly in SQL-151.
Using the SQL extensions FOR SYSTEM_TIME AS OF or BETWEEN, DB2
will have to check both the BASE and HISTORY table to make sure the
correct answer set is returned.
This is illustrated in this example by doing the EXPLAIN : both the base and
history table accessed. What puzzles me is what happened to QBLOCKNO 3
and 4 while the UNION is expected.
Notice the NON-CORRELATED SUBQUERY which is the re-written SQL
using AS OF (more about this later – in fact next slide )
To wrap up this topic and continue the EXPLAIN story, it isn‟t possible to use
correlation when accessing a temporal table.
An interesting article can be found in the June/July edition of System Z
Journal – written by Dan Luksetich.
Like we have schema challenges using CLONE tables, temporal tables are not
much different. The BASE and HISTORY must be identical.
Interesting that DB2 allows to add columns to the base table and not to the
history table – the explanation is that DB is pretty intelligent : the new column
is automatically being added to the HISTORY table so the SYSTEM TIME
design can continue un-interrupted.
There are certain changes which cannot be implemented / altered, so it is
necessary to get rid of the history – do the schema changes and put the history
back again (with the same changes).
The last box illustrates it is possible to INSERT into the history table – which
will be helpful to reload the existing history data after schema changes are
implemented.
If schema changes are needed – and these are not supported, the method is:
1) DROP VERSIONING to cut the link between BASE and HISTORY
2) Unload the HISTORY and drop the HISTORY – alternatively execute
schema changes for “history”
3) The same changes and tasks for BASE
4) Get data back if object has been dropped
5) ADD VERSIONING.
Attempting to implement schema changes not allowed when versioning is in
effect will result in SQL-478 / SQL-669
Another limitation you need to be aware of – especially when considering
growth and partitioning schema, it is not possible to ADD / ROTATE
partitions.
If this is needed – please review previous slides for UN-ALTERABLE
changes.
Let us have a close look at BUSINESS TIME temporal tables and what the
differences are. We will basically cover the same scenario of “Steens life” with
a couple of modifications.
30
We will be using the exact same base table definition as for the SYSTEM
TIME temporal table scenario.
In this example the base definition is already created, so I have to use Not Null
With Default – unless dropping and recreating. This is done in order to
illustrate its possible to enable TEMPORAL for existing tables.
Unlike SYSTEM TIME temporal tables, its possible to specify attributes for
the START / END columns other than TIMESTAMP(12) or ZONE – we can
use DATE instead – and the columns don‟t have to be ALWAYS generated.
Another difference is the enabling of temporal : using ADD PERIOD
BUSINESS_TIME as opposed to SYSTEM_TIME.
Unlike for SYSTEM temporal tables, using BUSINESS temporal we don‟t
need a history table – only if BI-TEMPORAL is needed. In this scenario we‟re
using a clean BUSINESS temporal design – not BI-TEMPORAL !!
One important issue to think about – since we don‟t have a history table, this
means that the BASE can have multiple rows for the same key. Meaning, you
will need to modify how the UNIQUE indexes are defined. For this purpose,
DB2 offers DUPLICATES within the key – if you create the index WITHOUT
OVERLAPS.
Like mentioned on the previous slide – one UNIQUE key / value can have
multiple entries due to different “valid timeframes”. Using WITHOUT
OVERLAPS allows the UNIQUE key to have multiple entries as long as the
START/END date/timestamp don‟t overlap – pretty neat.
The index will have to be specified with the “normal unique column(s)” and
then the clause BUSINESS_TIME WITHOUT OVERLAPS (the first box).
As always restrictions apply – for this kind of index you cannot specify it to be
PARTITIONED.
The second box illustrates that SSN column used to be unique but can exist as
duplicate as long as WITHOUT OVERLAPS is specified – meaning one entity
can exist only once at any specific point-in-time.
The first event for the BUSINESS TIME temporal scenario is the birth of
Steen. Unlike the SYSTEM TIME, we can dictate the START and END
dates/timestamp – so here we basically BACK-DATES the event.
In this case the START / END columns are defined as NNWD – something
you will have to consider during the design – since the application might not
want DEFAULT / CURRENT values for DATE / TIMESTAMP.
The second event is where Steen is being giving a name.
The table is updated as expected – the difference between BUSINESS TIME
and SYSTEM TIME is that no history table is involved. Instead the BASE
table will get additional row(s) depending on the START-END dates specified
in the UPDATE statement.
The cool thing about BUSINESS temporal tables is that you even can specify
events in the future – and even in the past if needed (we will see later).
In order to illustrate the power of BUSINESS TIME temporal tables, we will
tweak the Steen story a little bit : Steen doesn‟t report the address change
during divorce where he would have had to pay more taxes to the municipality.
Also – when he moves to the US – he wanted the actual date to be reflected on
the tax report – not the date he actually reported the move from Denmark to
the US.
Let‟s see . . . .
The third event is pretty straight forward – just a new address resulting in
another row with a new START date.
But there‟s a twist – since the address was reported after the fact, we need to
make sure this is reflected (meaning update in the past). For this purpose you
need to specify WHEN the row content is valid – hence another new piece of
syntax : FOR PORTION OF BUSINESS_TIME dictates when the content is
valid. This specific update results in TWO rows instead of one (or two instead
of three – or even three instead of one in case the end-date also changes).
The next event is where Steen gets divorced – he reports an address change
from when he gets his new apartment – which in fact is AFTER he moved out
from his ex-wife.
The “trick” here is that Steen reports the address change AFTER THE FACT –
so we need to specify this as a past event – hence using PORTION OF to
dictate the date range when the new information is valid.
So – Steen didn‟t report WHEN his divorce started (wanted to save tax) – but
something went South and the ex-wife reported the address change. Bottom
line – the tax department decides to change the address for a very specific
timeframe – hence the update covering PORTION OF from October 30 2002
until April 30 2003.
This results in TWO new rows in the table.
Just like covered in SYSTEM TIME for the HISTORY table, using
BUSINESS TIME might cause a lot of new entries – PLAN ACCORDINGLY
!!!!!!
Let‟s have a quick look into how SQL retrieval works for BUSINESS TIME
temporal tables.
In the previous scenario of SYSTEM TIME we used AS OF and BETWEEN.
The exact same parameters can be used for BUSINESS TIME.
In case you want to use GT / LT etc. these operators cannot be used in
conjunction with FOR BUSINESS_TIME but will have to be used as “normal”
predicates – but can be done !!
Just like AS OF can be used – using FOR BUSINESS_TIME BETWEEN two
dates / timestamps is absolutely valid. Just have in mind that if AS OF isn‟t
used, you might get zero, one, two or many rows.
As mentioned earlier – normal BETWEEN date/timestamp is valid and can be
used as well.
Now on to some “interesting aspects discovered” – personally I don‟t know if
these are bugs or WAD – issues have not been opened with IBM and the RSU
is early 2012.
This SQL SELECT statement seems invalid – no COMMA between
STA_DATE and SUBSTR – still getting a “valid” response set !!!!
Might not be an issue – but interesting.
You don‟t have to use FOR BUSINESS_TIME when retrieving rows – normal
predicates are still valid, but think for what you are asking for – maybe only
one row expected but several returned !!
The next event is Steen‟s passing !! So we delete him from the table – let‟s
look at the major differences between SYSTEM TIME and BUSINESS TIME.
For SYSTEM TIME the BASE table was empty – but the HISTORY table had
all past changes.
For BUSINESS TIME – there‟s NOTHING left – all well cleaned up. Another
issue to think about when implementing TEMPORAL tables – maybe this is a
reason to use BI-TEMPORAL.
It‟s time to check what DB2 really does under the cover to keep track of all
these moving parts.
When you ALTER the table to add VERSIONING or BUSINESS_TIME, DB2
creates constraints in the DB2 catalog – keeping track of what can be done /
can NOT be done and how to operate when updates/deletes happen.
45
Time to start to wrap up – there are TWO different (in fact three) TEMPORAL
tables – and which one to choose really depends on the application, the design
and how the application should behave. Pick the one that FITS – and make
sure the application does what‟s needed.
Also – be careful when current application are converted – there might be
TRIGGERS, Stored procedures etc. which might make things more difficult.
47
For utilities – there isn‟t a lot to be aware of compared to CLONBE tables,
NOT LOGGED objects etc. Just make sure your history tables are ready for
recovery and plan schema changes accordingly.