view materialization & maintenance strategies

24
View Materialization & View Materialization & Maintenance Strategies Maintenance Strategies By Ashkan Bayati By Ashkan Bayati & Ali Reza Vazifehdoost & Ali Reza Vazifehdoost

Upload: neil-mercer

Post on 03-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

View Materialization & Maintenance Strategies. By Ashkan Bayati & Ali Reza Vazifehdoost. Motivation. Complex Queries -Decision support queries -OLAP -Statistical Analysis -Business Intelligence -Aggregation Large data sets collected from Heterogeneous remote sources. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: View Materialization & Maintenance Strategies

View Materialization & View Materialization & Maintenance StrategiesMaintenance Strategies

By Ashkan Bayati By Ashkan Bayati

& Ali Reza Vazifehdoost& Ali Reza Vazifehdoost

Page 2: View Materialization & Maintenance Strategies

MotivationMotivation

Complex QueriesComplex Queries

-Decision support queries-Decision support queries

-OLAP-OLAP

-Statistical Analysis-Statistical Analysis

-Business Intelligence-Business Intelligence

-Aggregation-Aggregation

Large data sets collected Large data sets collected from Heterogeneous from Heterogeneous remote sourcesremote sources

Page 3: View Materialization & Maintenance Strategies

View Materialization & View Materialization & MaintenanceMaintenance

• View materialization is the process of View materialization is the process of pre-computing views (summarized pre-computing views (summarized information) in order to gain performanceinformation) in order to gain performance

• Drawback is to keep the view consistent Drawback is to keep the view consistent when the underlying data sources when the underlying data sources changechange

• View Maintenance is the process of View Maintenance is the process of keeping the view consistent with the keeping the view consistent with the underlying source tables underlying source tables

Page 4: View Materialization & Maintenance Strategies

Incremental View Incremental View MaintenanceMaintenance

• Relevant Updates only affect the Relevant Updates only affect the viewview

• The aim of incremental view The aim of incremental view maintenance is to re-compute the maintenance is to re-compute the view considering only the net view considering only the net changes that have taken place changes that have taken place instead of re-calculating the view instead of re-calculating the view from scratch.from scratch.

Page 5: View Materialization & Maintenance Strategies

SelectionSelection

• V = Ө C(y) (r) any tuple that satisfies C(y) V = Ө C(y) (r) any tuple that satisfies C(y) will be in the viewwill be in the view

• After inserts and deletes we get After inserts and deletes we get

V’= V + Ө C(y) (i) - Ө C(y) (d) V’= V + Ө C(y) (i) - Ө C(y) (d)

• The view can be incrementally maintained The view can be incrementally maintained by:by:

Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i)))Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i)))

Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d))Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d))

Page 6: View Materialization & Maintenance Strategies

ProjectionProjection

• Problem with Projections:Problem with Projections:

• Imagine if you deleteImagine if you delete

(1,10) from the base(1,10) from the base

tabletable

• Solution is to keep the key in the Solution is to keep the key in the view or use a counterview or use a counter

Page 7: View Materialization & Maintenance Strategies

JoinsJoins

• Inserts:Inserts:

Let V = r s and r’ = r i then:Let V = r s and r’ = r i then:

V’= r’ sV’= r’ s

= (r i) s = (r i) s

= (r s) (i s)= (r s) (i s)

= V (i s) = V (i s)

Deletes are similarDeletes are similar

Page 8: View Materialization & Maintenance Strategies

View Maintenance in Dynamic View Maintenance in Dynamic EnvironmentsEnvironments• Dynamic environment is specified here as one that Dynamic environment is specified here as one that

covers both data updates and schema changescovers both data updates and schema changes• Interleaving data updates and schema changes Interleaving data updates and schema changes

can cause problemscan cause problems• The following steps need to be taken:The following steps need to be taken: - Optimize updates based on their source relations - Optimize updates based on their source relations

and update types.and update types. - For schema changes that effect the view - For schema changes that effect the view

definition perform a view evolution process.definition perform a view evolution process. - Perform view adaptation to make the view - Perform view adaptation to make the view

consistent.consistent.

Page 9: View Materialization & Maintenance Strategies

Optimize UpdatesOptimize Updates

• DU’ = п (attr (R)) ∩ (attr(R’)) <DU> DU’ = п (attr (R)) ∩ (attr(R’)) <DU> • Its obvious to see п(attr (R)) ∩ (attr(R’)) contains Its obvious to see п(attr (R)) ∩ (attr(R’)) contains

all the attributes related to the view redefinition. all the attributes related to the view redefinition. This is essentially because neither dropped nor This is essentially because neither dropped nor added attributes will appear in the view added attributes will appear in the view definition. definition.

• The relationship between SC and DU are:The relationship between SC and DU are:1.1. If SCi’ contains drop relation Ri, then DUi If SCi’ contains drop relation Ri, then DUi

={} and SCi’ = drop relation Ri.={} and SCi’ = drop relation Ri.2.2. If SCi’ contains drop attribute operation If SCi’ contains drop attribute operation

both SCi’ and DUi’ might not be emptyboth SCi’ and DUi’ might not be empty3.3. If SCi’ contains no drop operation, then If SCi’ contains no drop operation, then

DUi’=DUi.DUi’=DUi.

Page 10: View Materialization & Maintenance Strategies

ExampleExample

• Assume a view V(A,B,C,D) is defined as Assume a view V(A,B,C,D) is defined as R1(A,B) R2(A,C) R3(C,D). Suppose R1 has R1(A,B) R2(A,C) R3(C,D). Suppose R1 has the following sequence of updates { +(3,2),the following sequence of updates { +(3,2),(1,4)} and relation R2 has the update (1,4)} and relation R2 has the update sequence { + (3,4), add field E, +(4,5,6), sequence { + (3,4), add field E, +(4,5,6), drop field C, -(5,7)}. drop field C, -(5,7)}.

• Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} and R2=(A,C) and R2’=(A,E). From this and R2=(A,C) and R2’=(A,E). From this information you can see that information you can see that attr((R)) ∩ attr((R)) ∩ (attr(R’))={A} ;hence DU2’={+3,+4,-5}.(attr(R’))={A} ;hence DU2’={+3,+4,-5}.

Page 11: View Materialization & Maintenance Strategies

Evolving View DefinitionEvolving View Definition

• Applying view synchronization:Applying view synchronization:

Page 12: View Materialization & Maintenance Strategies

Making the view consistentMaking the view consistent

• Now that the schema is consistent Now that the schema is consistent we need the view to become we need the view to become synchronized with the underlying synchronized with the underlying base table updates. Many base table updates. Many mechanisms have been defined I will mechanisms have been defined I will explain more on this issue later.explain more on this issue later.

Page 13: View Materialization & Maintenance Strategies

Efficient VM over distributed Efficient VM over distributed data sourcesdata sources

• Materialized view integrate and store data Materialized view integrate and store data from distributed data sources to ensure from distributed data sources to ensure better access, higher performance and better better access, higher performance and better availability.availability.

• Since the data sources are distributed the Since the data sources are distributed the network cost involved in transferring the net network cost involved in transferring the net changes can also be dramatic.changes can also be dramatic.

• State of the art view maintenance requires State of the art view maintenance requires 0(n^2) maintenance queries to remote data 0(n^2) maintenance queries to remote data sources with n being the number of data sources with n being the number of data sources in the view definition.sources in the view definition.

Page 14: View Materialization & Maintenance Strategies

GoalGoal

• The aim is to restructure the view The aim is to restructure the view maintenance queries in order to maintenance queries in order to reduce costs.reduce costs.

• HOW??HOW??

• Assume the Materialized view Assume the Materialized view

RR11►► R R 2 2 ►► R R 3 3 ►► R R 4. (4. (►= join)►= join)

Page 15: View Materialization & Maintenance Strategies

Restructuring Batch View Restructuring Batch View MaintenanceMaintenance

• State of ART:State of ART:

• Ri’=Ri + Ri’=Ri + RiRi• Hence O(n^2)Hence O(n^2)

Page 16: View Materialization & Maintenance Strategies

Adjacent GroupingAdjacent Grouping

• Adjacent Grouping (share common access to the maintenance Adjacent Grouping (share common access to the maintenance Queries):Queries):

For the previous example divide it up into two groups.For the previous example divide it up into two groups.

• It becomesIt becomes• (( RR1 1 ►► RR2 +2 +RR1’ 1’ ►►RR2) 2) ►R3►R4 + ►R3►R4 + (( RR33►► RR4+4+ R R 3’ 3’ ►► RR4)4) ►R1’ + R2’ hence 12 queries have ►R1’ + R2’ hence 12 queries have

been reduced to 8 hence O(n^1.5)been reduced to 8 hence O(n^1.5)

Page 17: View Materialization & Maintenance Strategies

Conditional GroupingConditional Grouping

• A more aggressive method is called A more aggressive method is called conditional grouping whose conditional grouping whose execution is 2*(n-1) maintenance execution is 2*(n-1) maintenance queries.queries.

• Scroll up phaseScroll up phase

Page 18: View Materialization & Maintenance Strategies

Conditional Grouping ContConditional Grouping Cont

• Scroll Down phaseScroll Down phase

Page 19: View Materialization & Maintenance Strategies

Self Maintenance of Multiple Self Maintenance of Multiple SPJ ViewsSPJ Views• The view V at level 0 can be described in The view V at level 0 can be described in

terms of nodes at level as tmp1 tmp3. terms of nodes at level as tmp1 tmp3. • Some tuples of tmp1 Some tuples of tmp1 and tmp3 do not join and tmp3 do not join into the view V; hence, into the view V; hence, we store these tuples we store these tuples in their respective AV’s in their respective AV’s for tmp1 and tmp3 at for tmp1 and tmp3 at level 1. level 1.

Page 20: View Materialization & Maintenance Strategies

Update takes place in Update takes place in Relation RRelation R

• There are two possible paths that U There are two possible paths that U (update) can take to find its way to (update) can take to find its way to the root node: the root node:

1.1. ∆∆V = (((U AV(S)) AV(T)) V = (((U AV(S)) AV(T)) AV(tmp1))) AV(tmp1)))

2.2. ∆∆V = V UV = V U

Page 21: View Materialization & Maintenance Strategies

Sub-treesSub-trees

• With this approach, a With this approach, a change in any sub-tree change in any sub-tree can be propagated to can be propagated to the root node without the root node without re-computing any of re-computing any of the other sub-trees. the other sub-trees. • Since we only store Since we only store tuples at level i if they do tuples at level i if they do not join into the node at level not join into the node at level i+1, the tuples are not i+1, the tuples are not duplicated in the tree. duplicated in the tree.

Page 22: View Materialization & Maintenance Strategies

Benefits of this approachBenefits of this approach

• The benefits of this procedure can be The benefits of this procedure can be summarized as follows:summarized as follows:

1.1. Changes to the view of a sub-tree only Changes to the view of a sub-tree only effectively change the root of that sub-treeeffectively change the root of that sub-tree

2.2. The view updates can effectively be The view updates can effectively be computed by joining only subsets of base computed by joining only subsets of base relations rather than the entire base relation. As relations rather than the entire base relation. As an example an example ∆∆V = (((U AV(S)) AV(T)) AV(tmp1))) V = (((U AV(S)) AV(T)) AV(tmp1))) rather than the traditional methodrather than the traditional method∆∆V = (((U S) T) AV(tmp1)))V = (((U S) T) AV(tmp1)))

Page 23: View Materialization & Maintenance Strategies

Multiple View MaintenanceMultiple View Maintenance

• Essentially the same as single view Essentially the same as single view maintenance however the AV of the maintenance however the AV of the shared node in the tree will be shared node in the tree will be different.different.

Page 24: View Materialization & Maintenance Strategies

Auxiliary View StructureAuxiliary View Structure

1.1. AV(temp3) stores tuples that do not join in V and do AV(temp3) stores tuples that do not join in V and do not join into V’ into two separate AV’s. The problem not join into V’ into two separate AV’s. The problem with this scheme is that it stores the set AV(temp3) with this scheme is that it stores the set AV(temp3) (V) ∩ AV(temp3) (V’).(V) ∩ AV(temp3) (V’).TheThe sub- sub-tree represented by tree represented by intermediate node temp3 will be recomputed twice intermediate node temp3 will be recomputed twice and the views V and V’ will be updated separately.and the views V and V’ will be updated separately.

2.2. AV(temp3) stores tuples that do not join in view V AV(temp3) stores tuples that do not join in view V and tuples that do not join in view V’ in three AV’s: and tuples that do not join in view V’ in three AV’s: AV(temp3)(V) , AV(temp3)(V’) and AV(temp3)AV(temp3)(V) , AV(temp3)(V’) and AV(temp3)(V∩V’). This eliminates duplicates, this will cut down (V∩V’). This eliminates duplicates, this will cut down the computational cost but incurs additional the computational cost but incurs additional overhead of placing tuples in the correct AV.overhead of placing tuples in the correct AV.