time versus standards - unibo.itschema evolution now … the dba nightmares: • frequent evolution...
TRANSCRIPT
October 2008 C. Zaniolo 1
Time VersusTime Versus StandardsStandards
Carlo ZanioloUCLA CSD
Many contributors ... mentioned at the end
Most slides due to C. Curino and H.J. Moon
October 2008 C. Zaniolo 2
Importance of the Past: Importance of the Past: Michelangelo,Michelangelo,Rosetta stone, Rosetta stone, Palimpsests,Palimpsests,ÖÖtzitzi
• the WEB achieves ubiquity, but not perpetuity of information. In fact digital artifacts tend to be less durable than the books of the past
• Preserving and managing past information is critical:• Rome Reborn: a ten-year project initiated at UCLA to
visualize the architectural history of ancient Rome digitally on the computer
• For evolving digital artifacts we want to retrieve their histories along with their snapshots.
• Example. Multiversion XML documents in the ICAP project [Wang2005]
October 2008 C. Zaniolo 3
MultiversionMultiversion XML Documents: the ICAP ProjectXML Documents: the ICAP Project
• Documents published in XML and queried in XQuery• UCLA course catalog: a new version every two years,• CIA World Fact book: a new one published every year,• W3C technical specs published in successive revisions.
• The structured DIFF between successive versions is then represented as an XML document called a V-document. Each element is timestamped with its
period of validity.• XQuery provides an effective query language for
querying V-document—including temporal queries.
October 2008 C. Zaniolo 4
Historical Information in DBMSHistorical Information in DBMS
• Time and historical information represent are critical in most information systems
• Temporal data and queries proved difficult to support in DBMS. E.g.,
• TSQL2: a comprehensive proposal for temporal SQL extensions proposed in mid 90s by leading researchers
• Unfortunately not well-received by SQL standard committees/DBMS vendors,
• Many new constructs and features---but the key design idea was to spare users the difficulties of specify explicit temporal joins and colaescing operators (in state based representations)
October 2008 C. Zaniolo 5
empn title ts tedeptno
Temporal Temporal DBsDBs: Many State: Many State--Based ModelsBased Models
• In state-based models, tuples are timestamped by their validity period
• Projection becomes difficult since it involves coalescing: e.g. project-out title
• Problem partially solved by decomposing:
(Id, Salary, Start, End)(Id, Title, Start, End)....
Few projection/coalesce but many temporal joins... which are no fun either
Coalescing and temporal joins: the crux of all temporal extensions of the relational model & SQL TSQL2: Implicit constructs used on periods--rather than standard SQL constructs• Simple temporal join/project queries are simple—more complex ones become difficult•Point-based and many others, but never went as far as TSQL2.
October 2008 C. Zaniolo 6
A personal view of TSQL2 Pros and ConsA personal view of TSQL2 Pros and Cons
• I learned from the best, • but still TSQL2 proved difficult to teach—simple
queries still simple--but more complex queries become very hard.
A. Pros: Comprehensive• Transaction time, Valid Time, bitemporal• Both event-based and state based supported• Schema-revision constructs
B. Cons:• Many Implementation issues unresolved• Biased toward state-based: no query construct for events &
time-seriesC. Post Mortem: TSQL2 ended up being blamed for
both A and B ?*!
October 2008 C. Zaniolo 7
OutlineOutline
1. Introduction—the importance of the past!2. Temporal XML.
October 2008 C. Zaniolo 8
XML as a Temporal Data ModelXML as a Temporal Data Model
DBMS vendors have been gung-ho about publishing relational data in XML—technical benefits questionable
But publishing the HISTORY of relational DBs in XML brings significant technical benefits:
1. Temporally grouped data model is represented quite naturally
2. XQuery for temporal queries1. Positive experience with ICAP2. Complex temporal queries are easily written3. Extensible language—simple temporal functions easily
added
3. Current XML standards: no change required4. Usability experience with graduate students quite
positive.
October 2008 C. Zaniolo 9
<db ts=T1 te=now><empacct ts=T1 te=now><row ts=T1 te=now>
<empno ts=T1 te=now>1001</empno><title ts=T1 te=T3>Engineer</title><title ts=T3 te=T4>Sr Engineer</title><title ts=T4 te=now>Tech Leader</title><deptno ts=T1 te=T3>d01</deptno><deptno ts=T3 te=now>d02</deptno>
</row></empacct>
</db>
• XML supports well the temporally grouped model [VLDBJ08]
XML for Temporal Data: history of XML for Temporal Data: history of emp(empno,titleemp(empno,title, , deptnodeptno))
Table
Columns
October 2008 C. Zaniolo 10
• XQuery is a good temporal query language• Complex temporal queries are easily written• Turing-complete language
• Current XML standards: no extension required • This is the logical view—that is then shredded into the flat
tables (and SQL/XML) for better performance [VLDBJ08]
XQuery for Temporal QueryXQuery for Temporal Query
Query 1: Temporal projection.Retrieve the title history of employee “Bob”
for $t in doc(“emp.xml”)/db/empacct/row[name=“Bob”]/title
return $t
Query 2: Temporal Snapshot. Retrieve all the titles at 1990-07-01:
for $t in doc(“emp.xml”)/db/empacct/row/title[@ts<=“1990-07-01” and @te>”1990-07-01”]
return $t
October 2008 C. Zaniolo 11
At the External/logical LevelAt the External/logical Level
XML and Xquery• effective at publishing the history of relational DBS• Can be easily extended to multi-version documents
• No extension to current standards, but
• Libraries of temporal functions were added to • shield users from the low-level details used in representing
time (e.g. now)
• provide users with reusable complex functions: e.g., temporal aggregates.
• Predefined functions: snapshot, interval functions, duration and date/time
October 2008 C. Zaniolo 12
Performance & scalability: Performance & scalability: alternative implementationsalternative implementations
At the physical level we experimented with:
1. Native XML DBMS—limited performance and scalability
2. Flat Relations: * XML views shredded into flat H-tables mappings,* XQuery mapped into SQL/XML statements
3. Nested Relations (now in SQL & supported in Oracle)
* closer to XML, but no obvious performance advantage
October 2008 C. Zaniolo 13
Internal Level: HInternal Level: H--tablestables
• Attribute history tables:• employee_title(empno, title, tstart, tend)
• employee_deptno(empno, deptno, tstart, tend)
• Ancillary tables: e.g., Key-History table • employee_id (empno, tstart, tend)
• XQuery statements on XML views implemented as SQL/XML statements on these tables.
• Temporal Joins by sort-merging the tables
• Temporal Indexing and clustering (via usefulness-based segmentation)
This is an efficient Internal representation for a wide range of logical views: XML, point-based, uniview (more on this later)
The The AA
October 2008 C. Zaniolo 14
rchival rchival IInformation nformation SSystem (ystem (ArchISArchIS))
Temporal Info.
Active Rules/
update logs
Relational Data
Current Database
SQL Queries
H-tables
Temporal Queries
XML-views
XQuery
User-Defined
Temporal Functions
AARRCCHHIISS
October 2008 C. Zaniolo 15
OutlineOutline
1. Introduction—the use of the past 2. XML/XQuery for managing multiversion
documents and the history of relational DBs• Effective way to keep history, as long as the schema
does not change over history.3. The Panta Rhei (Πάντα ῥεῖ) projects!
October 2008 C. Zaniolo 16
The The PantaPanta RheiRhei ProjectsProjects
1. The schema evolution benchmark [ICEIS 2008]http://yellowstone.cs.ucla.edu/schemaevolution/index.php
2. PRISM: A workbench for managing and automating the schema evolution Process [Curino et al. VLDB2008]
3. Historical Metadata Manager (HMM) to preserve and query the information schema history [ECDM 2008]
4. PRIMA: managing transaction-time DBs with evolving schemas: PRIMA [Moon et al. VLDB2008]
October 2008 C. Zaniolo 17
Schema EvolutionSchema Evolution
•Previous studies [Sjoberg, Marche, …] have focused on traditional information systems.
•But on the web everything evolves faster. Web Information Systems often involve large, cooperative, unstructured projects such as Wikipedia!
• WhMediaWiki (software platform behind wikipedia) >30.000 websites
• Popular (used by >30.000 websites including Wikipedia)
• open-source and well-documented software. We collect and dissect MediaWiki schema history (170+ schema versions in 4.5 years)
• a tool-suite to analyze Web Information System DB backends
•typical Schema Modification Operators (SMO)s identified!
•used to automate the schema evolution process, …PRISM
October 2008 C. Zaniolo 18
Basic Statistics Basic Statistics
• Schema Evolution:
• 170+ versions in 4.5 years
• almost 250% increase
•Up to 70% of queries lost after a schema revision
October 2008 C. Zaniolo 19
Schema Modification Operators (SMOs)Schema Modification Operators (SMOs)
• Language for schema change• Procedural fashion
• Do this, do that, … Easier for regular DBAs• Similar to primitives in [Bernstein06]
SMOs describing Wikipedia DB Schema Evolution [ICEIS08]
October 2008 C. Zaniolo 20
Schema Evolution NowSchema Evolution Now
… the DBA nightmares:
• Frequent evolution steps: Wikipedia case study 170 in 4.5 years!
• Data Migration: Data loss, redundancy, efficiency of the migration, efficiency of the new design
•Application conversion/rewriting: up to 70% query loss in wikipedia
• Few automation tools. Documentation automation still lacking.
PRISM Carlo A. Curino VLDB ‘08
October 2008 C. Zaniolo 21
Prisms ObjectivesPrisms Objectives
Desiderata PRISM
Support Evolution Design by Schema Modification Operators (SMO)
Increase Predictability of evolution
SMO static analysis--to forecast impact on schema, data and queries
Automate application conversion (query only for now)
Query translation based on SMOs between schema versions by:* Automatic query rewriting, or* View generation
Automate data migration Data migration scripts automatically generated from SMO sequences
Automate Documentation Historical Information Schema: HMM
October 2008 C. Zaniolo 22
Mapping & (QuasiMapping & (Quasi--)Inverse Mapping )Inverse Mapping
•The quasi inverse SMO-1 can be used to translate the original query q1 to translate the into an equivalent q’1 (using Mars’ Disjunctive Embedded Dependencies)
•Or it can be used `as is’ on a view defined by SMO-1
October 2008 C. Zaniolo 23
The The PantaPanta RheiRhei Projects: Projects: PRIMAPRIMA
1. The schema evolution benchmark [ICEIS 2008]http://yellowstone.cs.ucla.edu/schemaevolution/index.php
2. PRISM: A workbench for managing and automating the schema evolution Process [Curino et al. VLDB2008]
3. Historical Metadata Manager (HMM) to preserve and query the information schema history [ECDM 2008]
4. PRIMA: managing transaction-time DBs with evolving schemas [Moon et al. VLDB2008]
• Many problems: 1. How to Archive?2. How to Query?3. Optimization
Prima: builds on ArchIS approach
October 2008 C. Zaniolo 24
With Schema Changes Life is much harder.With Schema Changes Life is much harder.Problem1: How to Archive!Problem1: How to Archive!
Schema time
TDB1’
TDB1
T Data time
T
Data time
Schema time
TDB1
T
T
Current-Schema Archival
• History migrated into the current schema
Problem: Some history lost e.g. DROP COLUMN
Original-Schema Archival
• History stored under the original schema
Lossless!
TDB2
S1
S2
S1
S2TDB1’ + TDB2
October 2008 C. Zaniolo 25
MVMV--Document: ExampleDocument: ExampleV1: T1~T5
V2: T5~now
SMOs:MOVE COLUMN salary FROM job INTO empacct
WHERE empacct.title=job.title;DROP TABLE job;
<db ts=T1 te=now><empacct ts=T1 te=now>
<row ts=T1 te=now><empno ts=T1 te=now>1001</empno><title ts=T1 te=T3>Engineer</title><title ts=T3 te=T4>Sr Engineer</title><title ts=T4 te=now>Tech Leader</title><deptno ts=T1 te=T3>d01</title><deptno ts=T3 te=now>d02</title><salary ts=T5 te=now>70000</salary></row>
</empacct><job ts=T1 te=T5>
<row ts=T1 te=T5><title ts=T1 te=T5>Tech Leader</title><salary ts=T1 te=T5>70000</salary>
</row></job>
</db>
• Temporally grouped • No history duplicated – storage-efficient, schema changes applied quickly
October 2008 C. Zaniolo 26
Problem 2: How to Query w/ Schema Changes?Problem 2: How to Query w/ Schema Changes?
• Manual Querying• Write one query per version• Doesn’t scale: hundreds of versions
• Schema Versioning by Data Translation• Translate data into the queried version. Survey [Roddick95]• Inefficient!
• Implementation by Query Rewriting• Use above as semantics for query rewriting • Rewrite the input query into source versions• Efficient
Q: find salary history over last 20 years
Q’’’’ Q’’’ Q’’ Q’ Q
V2 V3 V4 V5
time
V1
October 2008 C. Zaniolo 27
Query RewritingQuery Rewriting• MARS for a query rewriting engine
[VLDB03]• Input: XQuery, ICs for XML (XICs)• Output: XQuery• Chase input query to find an equivalent
query modulo XICs• PRIMA translate SMO into XICs
• XML Integrity Constraints• ~ First-order logic with XPath• Simple case: MERGE TABLE S, T into R
(from v1 to v2)
PRIMA
XICs
Rewritten XQuery
Input XQuery
SMOs &SchemaHistory
SMOs
SMO2XIC
MARS
[/v1db/S](x1), [./@ts](x1,s), [./@te](x1,e), [./row](x1, x2)
→∃y1 [/v2db/R](y1), [./@ts](y1,s), [./@te](y1,e), [./row](y1,x2)
[/v1db/T](x1), [./@ts](x1,s), [./@te](x1,e), [./row](x1, x2)
→∃y1 [/v2db/R](y1), [./@ts](y1,s), [./@te](y1,e), [./row](y1,x2)
October 2008 C. Zaniolo 28
OptimizationOptimization
• Rewriting optimizations• Various techniques used to minimize cost of rewriting over
hundreds of schema versions.
• Query optimizations• MSF: Minimal Source-version Find
• Analyze input query’s temporal predicates to prune schema versions that can never contribute to the query answer
• TJF: Temporal Join Find• Find temporally joined relations, and transform join-of-unions
into union-of-joins plan
• E.g. (R1 U R2) ⋈ (S1 U S2) becomes (R1 ⋈ S1) U (R2 ⋈ S2)
• Special techniques to eliminate inter-version coalescing.
October 2008 C. Zaniolo 29
02468
101214161820
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
sec
Baseline MSD MSD+TJD
Native XML DB: Experiment Results (2/2)Native XML DB: Experiment Results (2/2)
• Temporal query optimization effective? • Synthetic employee data: 642KB of MV-document in MonetDB/XQuery• Five schema versions, 12 representative temporal queries• Without MSD (MinSourceDetect), 8 queries fail to finish (all bars touching
the highest grid indicate out-of-memory error)
The only two cases with
temporal joins
October 2008 C. Zaniolo 30
Relations at Internal Level: Relations at Internal Level: ArchisArchis HH--tablestables
• Attribute history tables:• employee_title(empno, title, tstart, tend)
• employee_deptno(empno, deptno, tstart, tend)
October 2008 C. Zaniolo 31
Performance of Physical Temporal Data ModelsPerformance of Physical Temporal Data Models
• Query: For all employees, find empno, hiredate, title, depno, and salary, using the current DB state
~1GB
October 2008 C. Zaniolo 32
So farSo far
1. Current XML standard provide a good basis for temporal state-based representations---at the logical level
• at the physical level tables are much better: shredding and SQL/XML provide the needed mapping
2. Schema evolution makes everything harder: but the problem is manageable
• At process level PRISM, and• For transaction-time DBs
3. Complex Event Processing: SQL standards under development provide an even better way to support temporal queries and reasoning
• Using SQL rather than XML• Finessing the distinction between state-based and envent-based
representation
October 2008 C. Zaniolo 33
ReferencesReferences
Ordered Sequences, Data Streams and SQL standards for event histories:
“Pattern matching in sequences of rows,” [SQL Change Proposal March 2007] by Fred Zemke (Oracle), Andrew Witkowski(Oracle), Mitch Cherniak (Streambase), Latha Colby (IBM)
“Optimization of Sequence Queries in Database Systems,” R. Sadri, C. Zaniolo, A. Zarkesh, J. Adibi: PODS 2001.
C. Zaniolo, “Temporal XML? SQL Strike back!,” Time ’05 +Many unpublished ideas
_______________________________
October 2008 C. Zaniolo 34
Pattern in Sequences, Time series, Data StreamsPattern in Sequences, Time series, Data Streams
Pages in a session: Sessions(SessNo, ClickTime, PageNo, PageType)
The merchants’ dream: a content page `c’, immediately followed by a description page `d’, followed by a purchase page `p’ in SQL-TS [Sadri et al. 2000]
SELECT Z.PageNo, Z.ClickTimeFROM Sessions AS (X, Y, Z)PARTITION BY SessNoORDER BY ClickTimeWHERE X.PageType=‘c’ AND Y.PageType=‘d’
AND Z.PageType=‘p’
IN SQL: Two Joins on SessNo: no significant problem.But the conditions `B immediately follows A’, and `C immediately follows
B’ are difficult to express and optimize. Specialized optimization techniques (e.g. Knuth, Morris and Pratt) should be used instead.
October 2008 C. Zaniolo 35
KleeneKleene Closure: repetitions allowed (*, +)Closure: repetitions allowed (*, +)
(S, D+, U+)where D < previous(D) AND U > previous(U)
• Describes a V pattern starting at S, going down on D and then up on U. More conditions to specify slope, min, max.
Application Examples:• W patterns in the stock market • Intrusion detection patterns• RFID-based tracking of objects • Ships (1) leaving one port (2) keeping their general directions,
until (3) they enter another port• Fishing boats: (1) leave port A, (2) travel toward the fishing
area, (3) where they and zig zag for a while after they arrive, and (4) they move on to a port or another fishing area.
S
D+ U+
October 2008 C. Zaniolo 36
Pattern in SQLPattern in SQL
• Extensions to SQL standards are being proposed for patterns (with Kleene closure)
• Application areas include: Data Streams, Complex Event Processing, ordered sequences in DBs, time series analysis,
• Many optimization issues remain. But a growing support expected as extension SQL 2003 OLAP functions—market pull
• Question: what will they do for temporal DBs?• New pattern constructs support well event-based queries• Will they help with state-based representations?• Yes, provided that we provide the right view.
October 2008 C. Zaniolo 37
Pattern in SQLPattern in SQL
Eno
A
e_salary(Eno, Start, End, Salary)e_title (Eno, Start, End, Title)e_dept (Eno, Start, End, Deptno)
B
A. - Frequent coalescing a problem, - view unsupportive of pattern queries
B. + Much less coalescing+ view supportive pattern queries+ but, an excellent basis for the actual implementation- but temporal joins are a problem
October 2008 C. Zaniolo 38
Toward Unified StateToward Unified State----Event viewEvent view
• e_salary(Eno, Start, End, Salary, T#) T#=1
• e_title (Eno, Start, End, Title, T#) T#=2
• e_dept (Eno, Start, End, Deptno, T#) T#=3
External, logical level: outerjoins on (Eno, T#)
each tuple in the three relations generates a distinct tuple
Ehist (Eno, Start, End, Salary, Title, Deptno, T# )1. Temporal joins are no longer a required operator: outer joins are
used instead!
2. What about projections on Ehist—coalescing required?
3. More complex temporal queries?
October 2008 C. Zaniolo 39
Unified View of Unified View of EhistEhist: group: group--by,partitionby,partition--by importantby important
Eno Start End Salary Title Deptno
1001 1995-01-01 1995-05-31 60000 ? ?
1001 1995-01-01 1995-09-30 ? Engineer ?
1001 1995-01-01 1995-09-30 ? ? d01
1001 1995-06-01 1996-12-31 70000 ? ?
1001 1995-10-01 1996-01-31 ? Sr Engineer ?
1001 1995-10-01 1996-12-31 ? ? d02
1001 1996-02-01 1996-12-31 ? Tech Leader ?
. . .
T#
1
2
3
1
2
3
2
Temporal projection: find title history SELECT Eno, Start, End, TitleFROM Ehist
Projection+selection WHERE T# = 2instead of coalesce
October 2008 C. Zaniolo 40
Temporal queries in this Unified View (Temporal queries in this Unified View (UniviewUniview))
• Temporal joins and projections easier• Snapshot queries a bit harder: more regular joins might
be required • Thus simple temporal queries can be supported in SQL
without extensions of the standards (but new temporal aggregates are still needed)
• What about complex temporal queries?
October 2008 C. Zaniolo 41
Complex Temporal Queries in Complex Temporal Queries in UniviewUniview
The new pattern constructs can be the solution.Example1: Star employee in a department. His/her
history shows:(1) One or more raise, followed by (2) a change of title (promotion), followed by (3) one or more raise, followed by (4) another change in title,… with no change of department
FROM Ehist as (R+, P1, R+, P2) WHERE … conditions on time and raises, etc.% but the no-department-change is implicit
October 2008 C. Zaniolo 42
Coalesce as a PatternCoalesce as a Pattern
1. Intervals ordered by their timestamps
2. pattern: next interval starts before the current max end
3. take the min start and max end of the periods in the pattern
October 2008 C. Zaniolo 43
ConclusionsConclusions• Preserving and querying the histories of documents and
databases is very important—but success chances and need for specialized extensions is questionable, since, e.g.,
• State-based logical views of database history can be effectively published in XML and queried in XQuery
• Relations and SQL/XML still used at the physical level • Hard research problem thus solved include:
• Managing schema evolution (PRISM)—including documentation• Supporting historical queries over multiple past schema versions
(PRIMA)• A time for changes in temporal databases? Newly proposed
SQL standards pave the way to:• Powerful query primitives for event based temporal queries and
reasoning• A temporal paradigm shift whereby the current schism between
state-based and event-based views is removed• The PRISM/PRIMA solutions can be easily adapted but many
challenges remain—e.g. bitemporal
October 2008 C. Zaniolo 44
Acknowledgments Acknowledgments • XML Versions: Shu-Yao Chien, Vassilis Tsotras (UCR)• SQL-TS: Reza Sadri• ICAP: Fusheng Wang, Richard Marciano (SDSC), Bertram Ludaescher
(UCD)• ArchIS: Fusheng Wang, Xin Zhou• Wikipedia & HMM: Carlo Curino, Hyun J. Moon, Letizia Tanca (PoliMi),
Song Meng• PRISM: Carlo Curino, Hyun Moon, Myungwon Ham• PRIMA: Hyun Moon, Carlo Curino, Alin Deutsch (UCSD), Chien-Yi Hou
(UCSD), Naren Gayam
This research has been sponsored by:• The National Science Foundation, • The National Historical Publications and Records Commission, and• Teradata Corporation
October 2008 C. Zaniolo 45
Acknowledgments Acknowledgments • XML Versions: Shu-Yao Chien, Vassilis Tsotras (UCR)• SQL-TS: Reza Sadri• ICAP: Fusheng Wang, Richard Marciano (SDSC), Bertram Ludaescher
(UCD)• ArchIS: Fusheng Wang, Xin Zhou• Wikipedia & HMM: Carlo Curino, Hyun J. Moon, Letizia Tanca (PoliMi),
Song Meng• PRISM: Carlo Curino, Hyun Moon, Myungwon Ham• PRIMA: Hyun Moon, Carlo Curino, Alin Deutsch (UCSD), Chien-Yi Hou
(UCSD), Naren Gayam
This research has been sponsored by:• The National Science Foundation, • The National Historical Publications and Records Commission, and• Teradata Corporation
October 2008 C. Zaniolo 46
Thank you!
Question or Comments?
ECDM'08 Carlo Zaniolo 46
October 2008 C. Zaniolo 47
ReferencesReferences
• Carlo Curino, Hyun Moon, Carlo Zaniolo: Graceful Database Schema Evolution: the PRISM Workbench: VLDB 2008, Auckland, New Zealand.
• Hyun Moon, Carlo Curino, Alin Deutsch, Chien-Yi Hou, Carlo Zaniolo: Managing and Querying Transaction-time Databases under Schema Evolution: VLDB 2008, Auckland, New Zealand.
• Carlo A. Curino, Hyun J. Moon, Letizia Tanca and Carlo Zaniolo: Schema Evolution in Wikipedia---toward a Web Information System Benchmark. ICEIS2008: the 10th International Conference on Enterprise Information Systems. June 12-16, 2008, Barcelona, Spain.
• Fusheng Wang, Carlo Zaniolo, and Xin Zhou: ArchIS: An XML-Based Approach to Transaction-Time Temporal Database Systems. The VLDB Journal, 2009.
• Fusheng Wang and Carlo Zaniolo: Temporal queries and version management in XML-based document
archives. Data Knowledge Engineering, Volume 65, Issue 2, May 2008, Pages 304-324
• X. Zhou, F. Wang and C. Zaniolo: Efficient Temporal Coalescing Query Support in Relational Database Systems . 17th International Conference on Database and Expert Systems Applications (DEXA'06), Krakow, Poland, September, 2006.
• F. Wang, X. Zhou and C. Zaniolo: Bridging Relational Database History and the Web: the XML Approach. ACM International Workshop on Web Information and Data Management (WIDM'06), Arlington, Virginia, USA, November 10, 2006.
• F. Wang, X. Zhou and C. Zaniolo, Using XML to Build Efficient Transaction-Time Temporal Database Systems on Relational Databases. Poster Paper. In Proc. of the 22nd International Conference on Data Engineering (ICDE'2006), April 3-7, Atlanta, Georgia, USA, 2006.
• Chien, S.Y, Tsotras, V., Zaniolo, C. Zhang, D.: Supporting Complex Queries on Multiversion XML Documents, ACM TOIT, February 2006.
October 2008 C. Zaniolo 48
References, cont.References, cont.
• Fusheng Wang, Carlo Zaniolo: An XML-Based Approach to Publishing and Querying the History of Databases, World Wide Web Journal, 8(3), 233–259, 2005
• Fusheng Wang, Carlo Zaniolo, Xin Zhou, Temporal XML? SQL Strikes Back!,TIME 2005: 12th International Symposium on Temporal Representation and Reasoning, 47-55.
• Fusheng Wang, Carlo Zaniolo, Xin Zhou, Hyun J. Moon: Version Management and Historical Queries in Digital Libraries. TIME 2005: 12th International Symposium on Temporal Representation and Reasoning, 207-209.
• Fusheng Wang, Carlo Zaniolo, Xin Zhou, Hyun J. Moon: Managing Multiversion Documents & Historical Databases: a Unified Solution Based on XML, WebDB 2005: 151-153.
• Fusheng Wang, Carlo Zaniolo: Preserving and Querying Histories of XML-Published Relational Databases. Procs. Second International Workshop on Evolution and Change in Data Management (ECDM 2002) , Tampere, Finland, October, 2002.
• S.-Y. Chien, V.J. Tsotras, C. Zaniolo, Efficient schemes for managing multiversion XML Documents, The VLDB Journal, December 2002.
• Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: Optimization of Sequence Queries in Database Systems. PODS 2001.
October 2008 C. Zaniolo 49
ReferncesRefernces, cont., cont.
• [Clifford95] J. Clifford, A. Croker, F. Grandi, A. Tuzhilin. On temporal grouping. In Recent Advances in Temporal Databases, pages 194–213. Springer Verlag, 1995.
• [DeCastro97] C. De Castro, F. Grandi, Maria R. Scalas. Schema Versioning for Multitemporal Relational Databases. Inf. Syst. 22(5): 249-290 (1997)
• [ECDM08] C. A. Curino, H. J. Moon, C. Zaniolo. Managing the history of metadata in support for db archiving and schema evolution. In ECDM, 2008.
• [ICEIS08] C. A. Curino, H. J. Moon, L. Tanca, C. Zaniolo. “Schema Evolution in Wikipedia: toward a Web Information System Benchmark”, International Conference on Enterprise Information Systems (ICEIS) 2008
• [ICDE09] C. A. Curino, H. J. Moon, M. Ham, C. Zaniolo, “The PRISM Workbench: Database Schema Evolution Without Tears.” To Appear in ICDE 2009 (Demo)
• [Marche93] S. Marche. “Measuring the stability of data models”, European Journal of Information Systems, 2(1):37-47, 1993.
• [Roddick95] J. Roddick. A Survey of Schema Versioning Issues for Database Systems. Information and Software Technology, 37(7):383–393, 1995.
• [Sjoberg93] D. I. Sjoberg. “Quantifying schema evolution”, Information and Software Technology, 35(1):35-44, 1993.
October 2008 C. Zaniolo 50
The The PantaPanta RheiRhei ProjectsProjects
1. The schema evolution benchmark [ICEIS 2008]http://yellowstone.cs.ucla.edu/schemaevolution/index.php
2. PRISM: A workbench for managing and automating the schema evolution Process [Curino et al. VLDB2008]
3. Historical Metadata Manager (HMM) to preserve and query the information schema history [ECDM 2008]
4. PRIMA: Supporting historical queries in DBs with evolving schemas: PRIMA [Moon et al. VLDB2008]
October 2008 C. Zaniolo 51
Better Performance: Shredding into TablesBetter Performance: Shredding into Tables
H-TablesXML
SQL/XMLXQuery
October 2008 C. Zaniolo 52
•Schema-evolution Benchmark and Tool Suite:
http://yellowstone.cs.ucla.edu/schema-evolution/index.php
October 2008 C. Zaniolo 53
Temporal Coalescing for Schema EvolutionTemporal Coalescing for Schema Evolution
• Schema changes create fragmented history• We store data history under original schema• Old data stored under the old schema, and the
same data under the new schema and successive period.
• Coalescing• Expensive: Single Scan Coalesce [Zhou06]: O(n) n
data size• CNesT: a nested timestamp storage scheme that
avoids coalescing due to multischema versions• One order of magnitude speed up.