time versus standards - unibo.itschema evolution now … the dba nightmares: • frequent evolution...

53
October 2008 C. Zaniolo 1 Time Versus Time Versus Standards Standards Carlo Zaniolo UCLA CSD Many contributors ... mentioned at the end Most slides due to C. Curino and H.J. Moon

Upload: others

Post on 09-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 1

Time VersusTime Versus StandardsStandards

Carlo ZanioloUCLA CSD

Many contributors ... mentioned at the end

Most slides due to C. Curino and H.J. Moon

Page 2: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 2

Importance of the Past: Importance of the Past: Michelangelo,Michelangelo,Rosetta stone, Rosetta stone, Palimpsests,Palimpsests,ÖÖtzitzi

• the WEB achieves ubiquity, but not perpetuity of information. In fact digital artifacts tend to be less durable than the books of the past

• Preserving and managing past information is critical:• Rome Reborn: a ten-year project initiated at UCLA to

visualize the architectural history of ancient Rome digitally on the computer

• For evolving digital artifacts we want to retrieve their histories along with their snapshots.

• Example. Multiversion XML documents in the ICAP project [Wang2005]

Page 3: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 3

MultiversionMultiversion XML Documents: the ICAP ProjectXML Documents: the ICAP Project

• Documents published in XML and queried in XQuery• UCLA course catalog: a new version every two years,• CIA World Fact book: a new one published every year,• W3C technical specs published in successive revisions.

• The structured DIFF between successive versions is then represented as an XML document called a V-document. Each element is timestamped with its

period of validity.• XQuery provides an effective query language for

querying V-document—including temporal queries.

Page 4: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 4

Historical Information in DBMSHistorical Information in DBMS

• Time and historical information represent are critical in most information systems

• Temporal data and queries proved difficult to support in DBMS. E.g.,

• TSQL2: a comprehensive proposal for temporal SQL extensions proposed in mid 90s by leading researchers

• Unfortunately not well-received by SQL standard committees/DBMS vendors,

• Many new constructs and features---but the key design idea was to spare users the difficulties of specify explicit temporal joins and colaescing operators (in state based representations)

Page 5: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 5

empn title ts tedeptno

Temporal Temporal DBsDBs: Many State: Many State--Based ModelsBased Models

• In state-based models, tuples are timestamped by their validity period

• Projection becomes difficult since it involves coalescing: e.g. project-out title

• Problem partially solved by decomposing:

(Id, Salary, Start, End)(Id, Title, Start, End)....

Few projection/coalesce but many temporal joins... which are no fun either

Coalescing and temporal joins: the crux of all temporal extensions of the relational model & SQL TSQL2: Implicit constructs used on periods--rather than standard SQL constructs• Simple temporal join/project queries are simple—more complex ones become difficult•Point-based and many others, but never went as far as TSQL2.

Page 6: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 6

A personal view of TSQL2 Pros and ConsA personal view of TSQL2 Pros and Cons

• I learned from the best, • but still TSQL2 proved difficult to teach—simple

queries still simple--but more complex queries become very hard.

A. Pros: Comprehensive• Transaction time, Valid Time, bitemporal• Both event-based and state based supported• Schema-revision constructs

B. Cons:• Many Implementation issues unresolved• Biased toward state-based: no query construct for events &

time-seriesC. Post Mortem: TSQL2 ended up being blamed for

both A and B ?*!

Page 7: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 7

OutlineOutline

1. Introduction—the importance of the past!2. Temporal XML.

Page 8: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 8

XML as a Temporal Data ModelXML as a Temporal Data Model

DBMS vendors have been gung-ho about publishing relational data in XML—technical benefits questionable

But publishing the HISTORY of relational DBs in XML brings significant technical benefits:

1. Temporally grouped data model is represented quite naturally

2. XQuery for temporal queries1. Positive experience with ICAP2. Complex temporal queries are easily written3. Extensible language—simple temporal functions easily

added

3. Current XML standards: no change required4. Usability experience with graduate students quite

positive.

Page 9: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 9

<db ts=T1 te=now><empacct ts=T1 te=now><row ts=T1 te=now>

<empno ts=T1 te=now>1001</empno><title ts=T1 te=T3>Engineer</title><title ts=T3 te=T4>Sr Engineer</title><title ts=T4 te=now>Tech Leader</title><deptno ts=T1 te=T3>d01</deptno><deptno ts=T3 te=now>d02</deptno>

</row></empacct>

</db>

• XML supports well the temporally grouped model [VLDBJ08]

XML for Temporal Data: history of XML for Temporal Data: history of emp(empno,titleemp(empno,title, , deptnodeptno))

Table

Columns

Page 10: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 10

• XQuery is a good temporal query language• Complex temporal queries are easily written• Turing-complete language

• Current XML standards: no extension required • This is the logical view—that is then shredded into the flat

tables (and SQL/XML) for better performance [VLDBJ08]

XQuery for Temporal QueryXQuery for Temporal Query

Query 1: Temporal projection.Retrieve the title history of employee “Bob”

for $t in doc(“emp.xml”)/db/empacct/row[name=“Bob”]/title

return $t

Query 2: Temporal Snapshot. Retrieve all the titles at 1990-07-01:

for $t in doc(“emp.xml”)/db/empacct/row/title[@ts<=“1990-07-01” and @te>”1990-07-01”]

return $t

Page 11: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 11

At the External/logical LevelAt the External/logical Level

XML and Xquery• effective at publishing the history of relational DBS• Can be easily extended to multi-version documents

• No extension to current standards, but

• Libraries of temporal functions were added to • shield users from the low-level details used in representing

time (e.g. now)

• provide users with reusable complex functions: e.g., temporal aggregates.

• Predefined functions: snapshot, interval functions, duration and date/time

Page 12: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 12

Performance & scalability: Performance & scalability: alternative implementationsalternative implementations

At the physical level we experimented with:

1. Native XML DBMS—limited performance and scalability

2. Flat Relations: * XML views shredded into flat H-tables mappings,* XQuery mapped into SQL/XML statements

3. Nested Relations (now in SQL & supported in Oracle)

* closer to XML, but no obvious performance advantage

Page 13: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 13

Internal Level: HInternal Level: H--tablestables

• Attribute history tables:• employee_title(empno, title, tstart, tend)

• employee_deptno(empno, deptno, tstart, tend)

• Ancillary tables: e.g., Key-History table • employee_id (empno, tstart, tend)

• XQuery statements on XML views implemented as SQL/XML statements on these tables.

• Temporal Joins by sort-merging the tables

• Temporal Indexing and clustering (via usefulness-based segmentation)

This is an efficient Internal representation for a wide range of logical views: XML, point-based, uniview (more on this later)

Page 14: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

The The AA

October 2008 C. Zaniolo 14

rchival rchival IInformation nformation SSystem (ystem (ArchISArchIS))

Temporal Info.

Active Rules/

update logs

Relational Data

Current Database

SQL Queries

H-tables

Temporal Queries

XML-views

XQuery

User-Defined

Temporal Functions

AARRCCHHIISS

Page 15: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 15

OutlineOutline

1. Introduction—the use of the past 2. XML/XQuery for managing multiversion

documents and the history of relational DBs• Effective way to keep history, as long as the schema

does not change over history.3. The Panta Rhei (Πάντα ῥεῖ) projects!

Page 16: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 16

The The PantaPanta RheiRhei ProjectsProjects

1. The schema evolution benchmark [ICEIS 2008]http://yellowstone.cs.ucla.edu/schemaevolution/index.php

2. PRISM: A workbench for managing and automating the schema evolution Process [Curino et al. VLDB2008]

3. Historical Metadata Manager (HMM) to preserve and query the information schema history [ECDM 2008]

4. PRIMA: managing transaction-time DBs with evolving schemas: PRIMA [Moon et al. VLDB2008]

Page 17: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 17

Schema EvolutionSchema Evolution

•Previous studies [Sjoberg, Marche, …] have focused on traditional information systems.

•But on the web everything evolves faster. Web Information Systems often involve large, cooperative, unstructured projects such as Wikipedia!

• WhMediaWiki (software platform behind wikipedia) >30.000 websites

• Popular (used by >30.000 websites including Wikipedia)

• open-source and well-documented software. We collect and dissect MediaWiki schema history (170+ schema versions in 4.5 years)

• a tool-suite to analyze Web Information System DB backends

•typical Schema Modification Operators (SMO)s identified!

•used to automate the schema evolution process, …PRISM

Page 18: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 18

Basic Statistics Basic Statistics

• Schema Evolution:

• 170+ versions in 4.5 years

• almost 250% increase

•Up to 70% of queries lost after a schema revision

Page 19: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 19

Schema Modification Operators (SMOs)Schema Modification Operators (SMOs)

• Language for schema change• Procedural fashion

• Do this, do that, … Easier for regular DBAs• Similar to primitives in [Bernstein06]

SMOs describing Wikipedia DB Schema Evolution [ICEIS08]

Page 20: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 20

Schema Evolution NowSchema Evolution Now

… the DBA nightmares:

• Frequent evolution steps: Wikipedia case study 170 in 4.5 years!

• Data Migration: Data loss, redundancy, efficiency of the migration, efficiency of the new design

•Application conversion/rewriting: up to 70% query loss in wikipedia

• Few automation tools. Documentation automation still lacking.

PRISM Carlo A. Curino VLDB ‘08

Page 21: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 21

Prisms ObjectivesPrisms Objectives

Desiderata PRISM

Support Evolution Design by Schema Modification Operators (SMO)

Increase Predictability of evolution

SMO static analysis--to forecast impact on schema, data and queries

Automate application conversion (query only for now)

Query translation based on SMOs between schema versions by:* Automatic query rewriting, or* View generation

Automate data migration Data migration scripts automatically generated from SMO sequences

Automate Documentation Historical Information Schema: HMM

Page 22: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 22

Mapping & (QuasiMapping & (Quasi--)Inverse Mapping )Inverse Mapping

•The quasi inverse SMO-1 can be used to translate the original query q1 to translate the into an equivalent q’1 (using Mars’ Disjunctive Embedded Dependencies)

•Or it can be used `as is’ on a view defined by SMO-1

Page 23: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 23

The The PantaPanta RheiRhei Projects: Projects: PRIMAPRIMA

1. The schema evolution benchmark [ICEIS 2008]http://yellowstone.cs.ucla.edu/schemaevolution/index.php

2. PRISM: A workbench for managing and automating the schema evolution Process [Curino et al. VLDB2008]

3. Historical Metadata Manager (HMM) to preserve and query the information schema history [ECDM 2008]

4. PRIMA: managing transaction-time DBs with evolving schemas [Moon et al. VLDB2008]

• Many problems: 1. How to Archive?2. How to Query?3. Optimization

Prima: builds on ArchIS approach

Page 24: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 24

With Schema Changes Life is much harder.With Schema Changes Life is much harder.Problem1: How to Archive!Problem1: How to Archive!

Schema time

TDB1’

TDB1

T Data time

T

Data time

Schema time

TDB1

T

T

Current-Schema Archival

• History migrated into the current schema

Problem: Some history lost e.g. DROP COLUMN

Original-Schema Archival

• History stored under the original schema

Lossless!

TDB2

S1

S2

S1

S2TDB1’ + TDB2

Page 25: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 25

MVMV--Document: ExampleDocument: ExampleV1: T1~T5

V2: T5~now

SMOs:MOVE COLUMN salary FROM job INTO empacct

WHERE empacct.title=job.title;DROP TABLE job;

<db ts=T1 te=now><empacct ts=T1 te=now>

<row ts=T1 te=now><empno ts=T1 te=now>1001</empno><title ts=T1 te=T3>Engineer</title><title ts=T3 te=T4>Sr Engineer</title><title ts=T4 te=now>Tech Leader</title><deptno ts=T1 te=T3>d01</title><deptno ts=T3 te=now>d02</title><salary ts=T5 te=now>70000</salary></row>

</empacct><job ts=T1 te=T5>

<row ts=T1 te=T5><title ts=T1 te=T5>Tech Leader</title><salary ts=T1 te=T5>70000</salary>

</row></job>

</db>

• Temporally grouped • No history duplicated – storage-efficient, schema changes applied quickly

Page 26: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 26

Problem 2: How to Query w/ Schema Changes?Problem 2: How to Query w/ Schema Changes?

• Manual Querying• Write one query per version• Doesn’t scale: hundreds of versions

• Schema Versioning by Data Translation• Translate data into the queried version. Survey [Roddick95]• Inefficient!

• Implementation by Query Rewriting• Use above as semantics for query rewriting • Rewrite the input query into source versions• Efficient

Q: find salary history over last 20 years

Q’’’’ Q’’’ Q’’ Q’ Q

V2 V3 V4 V5

time

V1

Page 27: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 27

Query RewritingQuery Rewriting• MARS for a query rewriting engine

[VLDB03]• Input: XQuery, ICs for XML (XICs)• Output: XQuery• Chase input query to find an equivalent

query modulo XICs• PRIMA translate SMO into XICs

• XML Integrity Constraints• ~ First-order logic with XPath• Simple case: MERGE TABLE S, T into R

(from v1 to v2)

PRIMA

XICs

Rewritten XQuery

Input XQuery

SMOs &SchemaHistory

SMOs

SMO2XIC

MARS

[/v1db/S](x1), [./@ts](x1,s), [./@te](x1,e), [./row](x1, x2)

→∃y1 [/v2db/R](y1), [./@ts](y1,s), [./@te](y1,e), [./row](y1,x2)

[/v1db/T](x1), [./@ts](x1,s), [./@te](x1,e), [./row](x1, x2)

→∃y1 [/v2db/R](y1), [./@ts](y1,s), [./@te](y1,e), [./row](y1,x2)

Page 28: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 28

OptimizationOptimization

• Rewriting optimizations• Various techniques used to minimize cost of rewriting over

hundreds of schema versions.

• Query optimizations• MSF: Minimal Source-version Find

• Analyze input query’s temporal predicates to prune schema versions that can never contribute to the query answer

• TJF: Temporal Join Find• Find temporally joined relations, and transform join-of-unions

into union-of-joins plan

• E.g. (R1 U R2) ⋈ (S1 U S2) becomes (R1 ⋈ S1) U (R2 ⋈ S2)

• Special techniques to eliminate inter-version coalescing.

Page 29: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 29

02468

101214161820

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12

sec

Baseline MSD MSD+TJD

Native XML DB: Experiment Results (2/2)Native XML DB: Experiment Results (2/2)

• Temporal query optimization effective? • Synthetic employee data: 642KB of MV-document in MonetDB/XQuery• Five schema versions, 12 representative temporal queries• Without MSD (MinSourceDetect), 8 queries fail to finish (all bars touching

the highest grid indicate out-of-memory error)

The only two cases with

temporal joins

Page 30: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 30

Relations at Internal Level: Relations at Internal Level: ArchisArchis HH--tablestables

• Attribute history tables:• employee_title(empno, title, tstart, tend)

• employee_deptno(empno, deptno, tstart, tend)

Page 31: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 31

Performance of Physical Temporal Data ModelsPerformance of Physical Temporal Data Models

• Query: For all employees, find empno, hiredate, title, depno, and salary, using the current DB state

~1GB

Page 32: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 32

So farSo far

1. Current XML standard provide a good basis for temporal state-based representations---at the logical level

• at the physical level tables are much better: shredding and SQL/XML provide the needed mapping

2. Schema evolution makes everything harder: but the problem is manageable

• At process level PRISM, and• For transaction-time DBs

3. Complex Event Processing: SQL standards under development provide an even better way to support temporal queries and reasoning

• Using SQL rather than XML• Finessing the distinction between state-based and envent-based

representation

Page 33: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 33

ReferencesReferences

Ordered Sequences, Data Streams and SQL standards for event histories:

“Pattern matching in sequences of rows,” [SQL Change Proposal March 2007] by Fred Zemke (Oracle), Andrew Witkowski(Oracle), Mitch Cherniak (Streambase), Latha Colby (IBM)

“Optimization of Sequence Queries in Database Systems,” R. Sadri, C. Zaniolo, A. Zarkesh, J. Adibi: PODS 2001.

C. Zaniolo, “Temporal XML? SQL Strike back!,” Time ’05 +Many unpublished ideas

_______________________________

Page 34: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 34

Pattern in Sequences, Time series, Data StreamsPattern in Sequences, Time series, Data Streams

Pages in a session: Sessions(SessNo, ClickTime, PageNo, PageType)

The merchants’ dream: a content page `c’, immediately followed by a description page `d’, followed by a purchase page `p’ in SQL-TS [Sadri et al. 2000]

SELECT Z.PageNo, Z.ClickTimeFROM Sessions AS (X, Y, Z)PARTITION BY SessNoORDER BY ClickTimeWHERE X.PageType=‘c’ AND Y.PageType=‘d’

AND Z.PageType=‘p’

IN SQL: Two Joins on SessNo: no significant problem.But the conditions `B immediately follows A’, and `C immediately follows

B’ are difficult to express and optimize. Specialized optimization techniques (e.g. Knuth, Morris and Pratt) should be used instead.

Page 35: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 35

KleeneKleene Closure: repetitions allowed (*, +)Closure: repetitions allowed (*, +)

(S, D+, U+)where D < previous(D) AND U > previous(U)

• Describes a V pattern starting at S, going down on D and then up on U. More conditions to specify slope, min, max.

Application Examples:• W patterns in the stock market • Intrusion detection patterns• RFID-based tracking of objects • Ships (1) leaving one port (2) keeping their general directions,

until (3) they enter another port• Fishing boats: (1) leave port A, (2) travel toward the fishing

area, (3) where they and zig zag for a while after they arrive, and (4) they move on to a port or another fishing area.

S

D+ U+

Page 36: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 36

Pattern in SQLPattern in SQL

• Extensions to SQL standards are being proposed for patterns (with Kleene closure)

• Application areas include: Data Streams, Complex Event Processing, ordered sequences in DBs, time series analysis,

• Many optimization issues remain. But a growing support expected as extension SQL 2003 OLAP functions—market pull

• Question: what will they do for temporal DBs?• New pattern constructs support well event-based queries• Will they help with state-based representations?• Yes, provided that we provide the right view.

Page 37: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 37

Pattern in SQLPattern in SQL

Eno

A

e_salary(Eno, Start, End, Salary)e_title (Eno, Start, End, Title)e_dept (Eno, Start, End, Deptno)

B

A. - Frequent coalescing a problem, - view unsupportive of pattern queries

B. + Much less coalescing+ view supportive pattern queries+ but, an excellent basis for the actual implementation- but temporal joins are a problem

Page 38: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 38

Toward Unified StateToward Unified State----Event viewEvent view

• e_salary(Eno, Start, End, Salary, T#) T#=1

• e_title (Eno, Start, End, Title, T#) T#=2

• e_dept (Eno, Start, End, Deptno, T#) T#=3

External, logical level: outerjoins on (Eno, T#)

each tuple in the three relations generates a distinct tuple

Ehist (Eno, Start, End, Salary, Title, Deptno, T# )1. Temporal joins are no longer a required operator: outer joins are

used instead!

2. What about projections on Ehist—coalescing required?

3. More complex temporal queries?

Page 39: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 39

Unified View of Unified View of EhistEhist: group: group--by,partitionby,partition--by importantby important

Eno Start End Salary Title Deptno

1001 1995-01-01 1995-05-31 60000 ? ?

1001 1995-01-01 1995-09-30 ? Engineer ?

1001 1995-01-01 1995-09-30 ? ? d01

1001 1995-06-01 1996-12-31 70000 ? ?

1001 1995-10-01 1996-01-31 ? Sr Engineer ?

1001 1995-10-01 1996-12-31 ? ? d02

1001 1996-02-01 1996-12-31 ? Tech Leader ?

. . .

T#

1

2

3

1

2

3

2

Temporal projection: find title history SELECT Eno, Start, End, TitleFROM Ehist

Projection+selection WHERE T# = 2instead of coalesce

Page 40: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 40

Temporal queries in this Unified View (Temporal queries in this Unified View (UniviewUniview))

• Temporal joins and projections easier• Snapshot queries a bit harder: more regular joins might

be required • Thus simple temporal queries can be supported in SQL

without extensions of the standards (but new temporal aggregates are still needed)

• What about complex temporal queries?

Page 41: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 41

Complex Temporal Queries in Complex Temporal Queries in UniviewUniview

The new pattern constructs can be the solution.Example1: Star employee in a department. His/her

history shows:(1) One or more raise, followed by (2) a change of title (promotion), followed by (3) one or more raise, followed by (4) another change in title,… with no change of department

FROM Ehist as (R+, P1, R+, P2) WHERE … conditions on time and raises, etc.% but the no-department-change is implicit

Page 42: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 42

Coalesce as a PatternCoalesce as a Pattern

1. Intervals ordered by their timestamps

2. pattern: next interval starts before the current max end

3. take the min start and max end of the periods in the pattern

Page 43: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 43

ConclusionsConclusions• Preserving and querying the histories of documents and

databases is very important—but success chances and need for specialized extensions is questionable, since, e.g.,

• State-based logical views of database history can be effectively published in XML and queried in XQuery

• Relations and SQL/XML still used at the physical level • Hard research problem thus solved include:

• Managing schema evolution (PRISM)—including documentation• Supporting historical queries over multiple past schema versions

(PRIMA)• A time for changes in temporal databases? Newly proposed

SQL standards pave the way to:• Powerful query primitives for event based temporal queries and

reasoning• A temporal paradigm shift whereby the current schism between

state-based and event-based views is removed• The PRISM/PRIMA solutions can be easily adapted but many

challenges remain—e.g. bitemporal

Page 44: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 44

Acknowledgments Acknowledgments • XML Versions: Shu-Yao Chien, Vassilis Tsotras (UCR)• SQL-TS: Reza Sadri• ICAP: Fusheng Wang, Richard Marciano (SDSC), Bertram Ludaescher

(UCD)• ArchIS: Fusheng Wang, Xin Zhou• Wikipedia & HMM: Carlo Curino, Hyun J. Moon, Letizia Tanca (PoliMi),

Song Meng• PRISM: Carlo Curino, Hyun Moon, Myungwon Ham• PRIMA: Hyun Moon, Carlo Curino, Alin Deutsch (UCSD), Chien-Yi Hou

(UCSD), Naren Gayam

This research has been sponsored by:• The National Science Foundation, • The National Historical Publications and Records Commission, and• Teradata Corporation

Page 45: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 45

Acknowledgments Acknowledgments • XML Versions: Shu-Yao Chien, Vassilis Tsotras (UCR)• SQL-TS: Reza Sadri• ICAP: Fusheng Wang, Richard Marciano (SDSC), Bertram Ludaescher

(UCD)• ArchIS: Fusheng Wang, Xin Zhou• Wikipedia & HMM: Carlo Curino, Hyun J. Moon, Letizia Tanca (PoliMi),

Song Meng• PRISM: Carlo Curino, Hyun Moon, Myungwon Ham• PRIMA: Hyun Moon, Carlo Curino, Alin Deutsch (UCSD), Chien-Yi Hou

(UCSD), Naren Gayam

This research has been sponsored by:• The National Science Foundation, • The National Historical Publications and Records Commission, and• Teradata Corporation

Page 46: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 46

Thank you!

Question or Comments?

ECDM'08 Carlo Zaniolo 46

Page 47: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 47

ReferencesReferences

• Carlo Curino, Hyun Moon, Carlo Zaniolo: Graceful Database Schema Evolution: the PRISM Workbench: VLDB 2008, Auckland, New Zealand.

• Hyun Moon, Carlo Curino, Alin Deutsch, Chien-Yi Hou, Carlo Zaniolo: Managing and Querying Transaction-time Databases under Schema Evolution: VLDB 2008, Auckland, New Zealand.

• Carlo A. Curino, Hyun J. Moon, Letizia Tanca and Carlo Zaniolo: Schema Evolution in Wikipedia---toward a Web Information System Benchmark. ICEIS2008: the 10th International Conference on Enterprise Information Systems. June 12-16, 2008, Barcelona, Spain.

• Fusheng Wang, Carlo Zaniolo, and Xin Zhou: ArchIS: An XML-Based Approach to Transaction-Time Temporal Database Systems. The VLDB Journal, 2009.

• Fusheng Wang and Carlo Zaniolo: Temporal queries and version management in XML-based document

archives. Data Knowledge Engineering, Volume 65, Issue 2, May 2008, Pages 304-324

• X. Zhou, F. Wang and C. Zaniolo: Efficient Temporal Coalescing Query Support in Relational Database Systems . 17th International Conference on Database and Expert Systems Applications (DEXA'06), Krakow, Poland, September, 2006.

• F. Wang, X. Zhou and C. Zaniolo: Bridging Relational Database History and the Web: the XML Approach. ACM International Workshop on Web Information and Data Management (WIDM'06), Arlington, Virginia, USA, November 10, 2006.

• F. Wang, X. Zhou and C. Zaniolo, Using XML to Build Efficient Transaction-Time Temporal Database Systems on Relational Databases. Poster Paper. In Proc. of the 22nd International Conference on Data Engineering (ICDE'2006), April 3-7, Atlanta, Georgia, USA, 2006.

• Chien, S.Y, Tsotras, V., Zaniolo, C. Zhang, D.: Supporting Complex Queries on Multiversion XML Documents, ACM TOIT, February 2006.

Page 48: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 48

References, cont.References, cont.

• Fusheng Wang, Carlo Zaniolo: An XML-Based Approach to Publishing and Querying the History of Databases, World Wide Web Journal, 8(3), 233–259, 2005

• Fusheng Wang, Carlo Zaniolo, Xin Zhou, Temporal XML? SQL Strikes Back!,TIME 2005: 12th International Symposium on Temporal Representation and Reasoning, 47-55.

• Fusheng Wang, Carlo Zaniolo, Xin Zhou, Hyun J. Moon: Version Management and Historical Queries in Digital Libraries. TIME 2005: 12th International Symposium on Temporal Representation and Reasoning, 207-209.

• Fusheng Wang, Carlo Zaniolo, Xin Zhou, Hyun J. Moon: Managing Multiversion Documents & Historical Databases: a Unified Solution Based on XML, WebDB 2005: 151-153.

• Fusheng Wang, Carlo Zaniolo: Preserving and Querying Histories of XML-Published Relational Databases. Procs. Second International Workshop on Evolution and Change in Data Management (ECDM 2002) , Tampere, Finland, October, 2002.

• S.-Y. Chien, V.J. Tsotras, C. Zaniolo, Efficient schemes for managing multiversion XML Documents, The VLDB Journal, December 2002.

• Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: Optimization of Sequence Queries in Database Systems. PODS 2001.

Page 49: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 49

ReferncesRefernces, cont., cont.

• [Clifford95] J. Clifford, A. Croker, F. Grandi, A. Tuzhilin. On temporal grouping. In Recent Advances in Temporal Databases, pages 194–213. Springer Verlag, 1995.

• [DeCastro97] C. De Castro, F. Grandi, Maria R. Scalas. Schema Versioning for Multitemporal Relational Databases. Inf. Syst. 22(5): 249-290 (1997)

• [ECDM08] C. A. Curino, H. J. Moon, C. Zaniolo. Managing the history of metadata in support for db archiving and schema evolution. In ECDM, 2008.

• [ICEIS08] C. A. Curino, H. J. Moon, L. Tanca, C. Zaniolo. “Schema Evolution in Wikipedia: toward a Web Information System Benchmark”, International Conference on Enterprise Information Systems (ICEIS) 2008

• [ICDE09] C. A. Curino, H. J. Moon, M. Ham, C. Zaniolo, “The PRISM Workbench: Database Schema Evolution Without Tears.” To Appear in ICDE 2009 (Demo)

• [Marche93] S. Marche. “Measuring the stability of data models”, European Journal of Information Systems, 2(1):37-47, 1993.

• [Roddick95] J. Roddick. A Survey of Schema Versioning Issues for Database Systems. Information and Software Technology, 37(7):383–393, 1995.

• [Sjoberg93] D. I. Sjoberg. “Quantifying schema evolution”, Information and Software Technology, 35(1):35-44, 1993.

Page 50: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 50

The The PantaPanta RheiRhei ProjectsProjects

1. The schema evolution benchmark [ICEIS 2008]http://yellowstone.cs.ucla.edu/schemaevolution/index.php

2. PRISM: A workbench for managing and automating the schema evolution Process [Curino et al. VLDB2008]

3. Historical Metadata Manager (HMM) to preserve and query the information schema history [ECDM 2008]

4. PRIMA: Supporting historical queries in DBs with evolving schemas: PRIMA [Moon et al. VLDB2008]

Page 51: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 51

Better Performance: Shredding into TablesBetter Performance: Shredding into Tables

H-TablesXML

SQL/XMLXQuery

Page 52: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 52

•Schema-evolution Benchmark and Tool Suite:

http://yellowstone.cs.ucla.edu/schema-evolution/index.php

Page 53: Time Versus Standards - unibo.itSchema Evolution Now … the DBA nightmares: • Frequent evolution steps: Wikipedia case study 170 in 4.5 years! • Data Migration: Data loss, redundancy,

October 2008 C. Zaniolo 53

Temporal Coalescing for Schema EvolutionTemporal Coalescing for Schema Evolution

• Schema changes create fragmented history• We store data history under original schema• Old data stored under the old schema, and the

same data under the new schema and successive period.

• Coalescing• Expensive: Single Scan Coalesce [Zhou06]: O(n) n

data size• CNesT: a nested timestamp storage scheme that

avoids coalescing due to multischema versions• One order of magnitude speed up.